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Description 



BIOINFORMATICALLY DETECTABLE 
GROUP OF NOVEL REGULATORY VIRAL 
AND VIRAL ASSOCIATED 
OLIGONUCLEOTIDES AND USES THEREOF 

CROSS REFERENCE TO RELATED APPLICATIONS 

[0001] This application is a continuation in part of and claims 

priority from the following patent applications, the disclo- 
sures of which applications are all hereby incorporated 
herein by reference: U.S. Patent Application Serial No. 
10/604984 filed 29-Aug-03, U.S. Patent Application Se- 
rial No. 10/604945 filed 27-Aug-03, and U.S. Patent Ap- 
plication Serial No. 10/604942 filed 27-Aug-03. This ap- 
plication also claims priority from International Applica- 
tion Number: PCT/IL 03/009998, filed 26-Nov-03, the 
disclosure of which application is hereby incorporated 
herein by reference. All of the aforesaid patent applica- 
tions are entitled "Bioinformatically Detectable Group of 



Novel Viral Regulatory Genes and Uses Thereof; This ap- 
plication also is a continuation in part of and claims prior- 
ity from U.S. Patent Application Serial No. 10/708952 filed 
2-Apr-04, entitled "Bioinformatically Detectable Group of 
Novel Viral Regulatory Oligonucleotides and Uses 
Thereof';This application also is a continuation in part of 
and claims priority from the following patent applications: 
U.S. Patent Application Serial No. 10/605838 filed 
30-Oct-03, and U.S. Patent Application Serial No. 
10/604944 filed 28-Aug-03. Both of the aforesaid patent 
applications are entitled "Bioinformatically Detectable 
Group of Novel HIV Regulatory Genes and Uses Thereof; 
This application also is a continuation in part of and 
claims priority from the following patent applications: U.S. 
Patent Application Serial No. 10/605840 filed 30-Oct-03, 
and U.S. Patent Application Serial No. 10/604943 filed 
28-Aug-03. Both of the aforesaid patent applications are 
entitled "Bioinformatically Detectable Group of Novel Vac- 
cinia Regulatory Genes and Uses Thereof; This applica- 
tion also is a continuation in part of U.S. Provisional Patent 
Application Serial No. 60/521,433 filed 26-Apr-04, enti- 
tled "A Microarray for the Detection of MicroRNA Oligonu- 
cleotides", the disclosure of which is hereby incorporated 



by reference and claims priority therefrom. U.S Patent Ap- 
plication Serial No. 10/708952, filed 2-Apr-04, entitled 
"Bioinformatically Detectable Group of Novel Viral Regula- 
tory Oligonucleotides and Uses Thereof is a continuation 
in part of and claims priority from the following patent 
applications, the disclosures of which applications are all 
hereby incorporated herein by reference: U.S. Patent Ap- 
plication Serial No. 10/604984 filed 29-Aug-03, U.S. 
Patent Application Serial No. 10/604945 filed 27-Aug-03, 
and U.S. Patent Application Serial No. 10/604942 filed 
27-Aug-03. This application also claims priority from In- 
ternational Application Number: PCT/IL 03/009998, filed 
26-NOV-03, the disclosure of which application is hereby 
incorporated herein by reference. All of the aforesaid 
patent applications are entitled "Bioinformatically De- 
tectable Group of Novel Viral Regulatory Genes and Uses 
Thereof"; This application also is a continuation in part of 
and claims priority from the following patent applications: 
U.S. Patent Application Serial No. 10/605838 filed 
30-Oct-03, and U.S. Patent Application Serial No. 
10/604944 filed 28-Aug-03. Both of the aforesaid patent 
applications are entitled "Bioinformatically Detectable 
Group of Novel HIV Regulatory Genes and Uses Thereof; 



This application also is a continuation in part of and 
claims priority from the following patent applications: U.S. 
Patent Application Serial No. 10/605840 filed 30-Oct-03, 
and U.S. Patent Application Serial No. 10/604943 filed 

28- Aug-03. Both of the aforesaid patent applications are 
entitled "Bioinformatically Detectable Group of Novel Vac- 
cinia Regulatory Genes and Uses Thereof; International 
Application Number: PCT/IL 03/00998, filed 26-Nov-03, 
entitled "Bioinformatically Detectable Group of Novel Viral 
Regulatory Genes and Uses Thereof" claims priority from 
the following patent applications, the disclosures of which 
applications are all hereby incorporated herein by refer- 
ence: U.S. Patent Application Serial No. 10/604984 filed 

29- Aug-03, U.S. Patent Application Serial No. 10/604945 
filed 27-Aug-03, U.S. Patent Application Serial No. 
10/604942 filed 27-Aug-03, and U.S. Provisional Patent 
Application Serial No. 60/457788 filed 27-Mar-03. All of 
the aforesaid patent applications are entitled "Bioinformat- 
ically Detectable Group of Novel Viral Regulatory Genes 
and Uses Thereof; This application also claims priority 
from the following patent applications: U.S. Patent Appli- 
cation Serial No. 10/605838 filed 30-Oct-03, and U.S. 
Patent Application Serial No. 10/604944 filed 28-Aug-03. 



Both of the aforesaid patent applications are entitled 
"Bioinformatically Detectable Group of Novel HIV Regula- 
tory Genes and Uses Thereof; This application also claims 
priority from the following patent applications: U.S. Patent 
Application Serial No. 10/605840 filed 30-Oct-03, U.S. 
Patent Application Serial No. 10/604943 filed 28-Aug-03, 
and U.S. Provisional Patent Application Serial No. 
60/441241 filed 17-Jan-03. All of the aforesaid patent 
applications are entitled "Bioinformatically Detectable 
Group of Novel Vaccinia Regulatory Genes and Uses 
Thereof; U.S Patent Application Serial No.10/605840, 
filed 30-Oct-03, entitled "Bioinformatically Detectable 
Group of Novel Vaccinia Regulatory Genes and Uses 
Thereof is a continuation of and claims priority from the 
following patent applications, the disclosures of which ap- 
plications are all hereby incorporated herein by reference: 
U.S. Patent Application Serial No. 10/604943 filed 
28-Aug-03, and U.S Provisional Patent Application Serial 
No. 60/441241 filed 17-Jan-03. Both of the aforesaid 
patent applications are entitled "Bioinformatically De- 
tectable Group of Novel Vaccinia Regulatory Genes and 
Uses Thereof; This application also is a continuation in 
part of and claims priority from the following patent ap- 



plications: U.S. Patent Application Serial No. 10/604984 
filed 29-Aug-03, U.S. Patent Application Serial No. 
10/604945 filed 27-Aug-03, U.S. Patent Application Se- 
rial No. 10/604942 filed 27-Aug-03, U.S Provisional 
Patent Application Serial No. 60/457788 filed 27-Mar-03, 
and U.S. Patent Application Serial No. 10/310188 filed 
5-Dec-02. All of the aforesaid patent applications are en- 
titled "Bioinformatically Detectable Group of Novel Viral 
Regulatory Genes and Uses Thereof"; This application also 
is a continuation in part of and claims priority from the 
following patent applications: U.S. Patent Application Se- 
rial No. 10/605838 filed 30-Oct-03, U.S. Patent Applica- 
tion Serial No. 10/604944 filed 28-Aug-03, and U.S Pro- 
visional Patent Application Serial No. 60/441230 filed 
16-Jan-03. All of the aforesaid patent applications are 
entitled "Bioinformatically Detectable Group of Novel HIV 
Regulatory Genes and Uses Thereof"; U.S Patent Applica- 
tion Serial No.10/605838, filed 30-Oct-03, entitled "Bioin- 
formatically Detectable Group of Novel HIV Regulatory 
Genes and Uses Thereof is a continuation of and claims 
priority from the following patent applications, the disclo- 
sures of which applications are all hereby incorporated 
herein by reference: U.S. Patent Application Serial No. 



10/604944 filed 28-Aug-03, and U.S Provisional Patent 
Application Serial No. 60/441230 filed 16-Jan-03. Both of 
the aforesaid patent applications are entitled "Bioinformat- 
ically Detectable Croup of Novel HIV Regulatory Genes and 
Uses Thereof; This application also is a continuation in 
part of and claims priority from the following patent ap- 
plications: U.S. Patent Application Serial No. 10/604984 
filed 29-Aug-03, U.S. Patent Application Serial No. 
10/604945 filed 27-Aug-03, U.S. Patent Application Se- 
rial No. 10/604942 filed 27-Aug-03, U.S Provisional 
Patent Application Serial No. 60/457788 filed 27-Mar-03, 
and U.S. Patent Application Serial No. 10/310188 filed 
5-Dec-02. All of the aforesaid patent applications are en- 
titled "Bioinformatically Detectable Group of Novel Viral 
Regulatory Genes and Uses Thereof; This application also 
is a continuation in part of and claims priority from the 
following patent applications: U.S. Patent Application Se- 
rial No. 10/604943 filed 28-Aug-03, and U.S Provisional 
Patent Application Serial No. 60/441241 filed 17-Jan-03. 
Both of the aforesaid patent applications are entitled 
"Bioinformatically Detectable Group of Novel Vaccinia 
Regulatory Genes and Uses Thereof; U.S Patent Applica- 
tion Serial No.10/604984, filed 29-Aug-03, entitled 



"Bioinformatically Detectable Group of Novel Viral Regula- 
tory Genes and Uses Thereof is a continuation of U.S Pro- 
visional Patent Application Serial No. 60/457788, filed 
27-Mar-03, entitled "Bioinformatically Detectable Group 
of Novel Regulatory Genes and Uses Thereof ", the disclo- 
sure of which is hereby incorporated herein and claims 
priority therefrom; and is a continuation in part of and 
claims priority from the following patent applications, the 
disclosures of which applications are all hereby incorpo- 
rated herein by reference: U.S. Patent Application Serial 
No. U.S. Patent Application Serial No. 10/604945 filed 

27- Aug-03, U.S. Patent Application Serial No. 10/604942 
filed 27-Aug-03, U.S. Patent Application Serial No. 
10/310188 filed 5-Dec-02, and U.S. Patent Application 
Serial No. 10/303778 filed 26-Nov-02. All of the afore- 
said patent applications are entitled "Bioinformatically De- 
tectable Group of Novel Viral Regulatory Genes and Uses 
Thereof"; This application also is a continuation in part of 
and claims priority from the following patent applications: 
U.S. Patent Application Serial No. 10/604944 filed 

28- Aug-03, and U.S Provisional Patent Application Serial 
No. 60/441230 filed 16-Jan-03. Both of the aforesaid 
patent applications are entitled "Bioinformatically De- 



tectable Group of Novel HIV Regulatory Genes and Uses 
Thereof"; This application also is a continuation in part of 
and claims priority from the following patent applications: 
U.S. Patent Application Serial No. 10/604943 filed 
28-Aug-03, and U.S Provisional Patent Application Serial 
No. 60/441241 filed 17-Jan-03. Both of the aforesaid 
patent applications are entitled "Bioinformatically De- 
tectable Group of Novel Vaccinia Regulatory Genes and 
Uses Thereof; U.S Patent Application Serial 
No. 10/604943 filed 28-Aug-03, entitled "Bioinformati- 
cally Detectable Group of Novel Vaccinia Regulatory Genes 
and Uses Thereof is a continuation of U.S Provisional 
Patent Application Serial No. 60/441241, filed 17-Jan-03, 
entitled "Bioinformatically Detectable Group of Novel Vac- 
cinia Regulatory Genes and Uses Thereof ", the disclosure 
of which is hereby incorporated herein and claims priority 
therefrom; This application also is a continuation in part 
of and claims priority from the following patent applica- 
tions: U.S. Patent Application Serial No. 10/604945 filed 
27-Aug-03, U.S. Patent Application Serial No. 10/604942 
filed 27-Aug-03, U.S Provisional Patent Application Serial 
No. 60/457788 filed 27-Mar-03, U.S. Patent Application 
Serial No. 10/310188 filed 5-Dec-02, and U.S. Patent Ap- 



plication Serial No. 10/303778 filed 26-Nov-02. All of the 
aforesaid patent applications are entitled "Bioinformati- 
cally Detectable Group of Novel Viral Regulatory Genes 
and Uses Thereof"; This application also is a continuation 
in part of and claims priority from the following patent 
applications: U.S. Patent Application Serial No. 10/604944 
filed 28-Aug-03, and U.S Provisional Patent Application 
Serial No. 60/441230 filed 16-Jan-03. Both of the afore- 
said patent applications are entitled "Bioinformatically De- 
tectable Group of Novel hiv Regulatory Genes and Uses 
Thereof; U.S Patent Application Serial No.10/604944, 
filed 28-Aug-03, entitled "Bioinformatically Detectable 
Group of Novel HIV Regulatory Genes and Uses Thereof is 
a continuation of U.S Provisional Patent Application Serial 
No. 60/441230, filed 16-Jan-03, entitled "Bioinformati- 
cally Detectable Group of Novel HIV Regulatory Genes and 
Uses Thereof ", the disclosure of which is hereby incorpo- 
rated herein and claims priority therefrom; This applica- 
tion also is a continuation in part of and claims priority 
from the following patent applications: U.S. Patent Appli- 
cation Serial No. 10/604945 filed 27-Aug-03, U.S. Patent 
Application Serial No. 10/604942 filed 27-Aug-03, U.S 
Provisional Patent Application Serial No. 60/457788 filed 



27-Mar-03, U.S. Patent Application Serial No. 10/310188 
filed 5-Dec-02, and U.S. Patent Application Serial No. 
10/303778 filed 26-Nov-02. All of the aforesaid patent 
applications are entitled "Bioinformatically Detectable 
Croup of Novel Viral Regulatory Genes and Uses Thereof; 
This application also is a continuation in part of U.S Provi- 
sional Patent Application Serial No.60/441241, filed 
17-Jan-03, entitled "Bioinformatically Detectable Group of 
Novel Vaccinia Regulatory Genes and Uses Thereof the 
disclosure of which is hereby incorporated by reference 
and claims priority therefrom; U.S Patent Application Serial 
No.10/604945, filed 27-Aug-03, entitled "Bioinformati- 
cally Detectable Group of Novel Viral Regulatory Genes 
and Uses Thereof is a continuation of U.S Patent Applica- 
tion Serial No. 10/303778, filed 26-Nov-02, entitled 
"Bioinformatically Detectable Group of Novel Regulatory 
Genes and Uses Thereof", the disclosure of which is 
hereby incorporated herein and claims priority therefrom; 
and is a continuation in part of and claims priority from 
the following patent applications, the disclosures of which 
applications are all hereby incorporated herein by refer- 
ence: U.S. Patent Application Serial No. 10/604942 filed 
27-Aug-03, U.S Provisional Patent Application Serial No. 



60/457788 filed 27-Mar-03, and U.S. Patent Application 
Serial No. 10/310188 filed 5-Dec-02. All of the aforesaid 
patent applications are entitled "Bioinformatically De- 
tectable Croup of Novel Viral Regulatory Genes and Uses 
Thereof; This application also is a continuation in part of 
U.S Provisional Patent Application Serial No. 60/441230 
filed 16-Jan-03, entitled "Bioinformatically Detectable 
Croup of Novel HIV Regulatory Genes and Uses Thereof, 
the disclosure of which is hereby incorporated by refer- 
ence and claims priority therefrom; This application also is 
a continuation in part of U.S Provisional Patent Application 
Serial No.60/441241, filed 17-Jan-03, entitled "Bioinfor- 
matically Detectable Group of Novel Vaccinia Regulatory 
Genes and Uses Thereof", the disclosure of which is 
hereby incorporated by reference and claims priority 
therefrom; U.S Patent Application Serial No. 10/604942, 
filed 27-Aug-03, entitled "Bioinformatically Detectable 
Group of Novel Viral Regulatory Genes and Uses Thereof 
is a continuation of U.S Patent Application Serial No. 
10/310188, filed 5-Dec-02, entitled "Bioinformatically 
Detectable Group of Novel Regulatory Genes and Uses 
Thereof ", the disclosure of which is hereby incorporated 
herein and claims priority therefrom; and is a continuation 



in part of and claims priority from the following patent 
applications, the disclosures of which applications are all 
hereby incorporated herein by reference: U.S Provisional 
Patent Application Serial No. 60/457788 filed 27-Mar-03, 
and U.S. Patent Application Serial No. 10/303778 filed 
26-NOV-02. All of the aforesaid patent applications are 
entitled "Bioinformatically Detectable Group of Novel Viral 
Regulatory Genes and Uses Thereof; This application also 
is a continuation in part of U.S Provisional Patent Applica- 
tion Serial No. 60/441230 filed 16-Jan-03, entitled "Bioin- 
formatically Detectable Group of Novel HIV Regulatory 
Genes and Uses Thereof, the disclosure of which is 
hereby incorporated by reference and claims priority 
therefrom; This application also is a continuation in part 
of U.S Provisional Patent Application Serial No. 60/441241, 
filed 17-Jan-03, entitled "Bioinformatically Detectable 
Group of Novel Vaccinia Regulatory Genes and Uses 
Thereof", the disclosure of which is hereby incorporated 
by reference and claims priority therefrom; U.S Provisional 
Patent Application Serial No.60/457788, filed 27-Mar-03, 
entitled "Bioinformatically Detectable Group of Novel Viral 
Regulatory Genes and Uses Thereof is a continuation in 
part of and claims priority from the following patent ap- 



plications, the disclosures of which applications are all 
hereby incorporated herein by reference: U.S Patent Appli- 
cation Serial No. 10/310188 filed 5-Dec-02, and U.S. 
Patent Application Serial No. 10/303778 filed 26-Nov-02. 
Both of the aforesaid patent applications are entitled 
"Bioinformatically Detectable Group of Novel Viral Regula- 
tory Genes and Uses Thereof; This application also is a 
continuation in part of U.S Provisional Patent Application 
Serial No. 60/441230 filed 16-Jan-03, entitled "Bioinfor- 
matically Detectable Group of Novel HIV Regulatory Genes 
and Uses Thereof, the disclosure of which is hereby in- 
corporated by reference and claims priority therefrom; 
This application also is a continuation in part of U.S Provi- 
sional Patent Application Serial No.60/441241, filed 
17-Jan-03, entitled "Bioinformatically Detectable Group of 
Novel Vaccinia Regulatory Genes and Uses Thereof ", the 
disclosure of which is hereby incorporated by reference 
and claims priority therefrom; U.S Provisional Patent Ap- 
plication Serial No.60/441241, filed 17-Jan-03, entitled 
"Bioinformatically Detectable Group of Novel Vaccinia 
Regulatory Genes and Uses Thereof is a continuation in 
part of and claims priority from the following patent ap- 
plications, the disclosures of which applications are all 



hereby incorporated herein by reference: U.S Patent Appli- 
cation Serial No. 10/310188 filed 5-Dec-02, and U.S. 
Patent Application Serial No. 10/303778 filed 26-Nov-02. 
Both of the aforesaid patent applications are entitled 
"Bioinformatically Detectable Group of Novel Viral Regula- 
tory Genes and Uses Thereof; This application also is a 
continuation in part of U.S Provisional Patent Application 
Serial No. 60/441230 filed 16-Jan-03, entitled "Bioinfor- 
matically Detectable Group of Novel HIV Regulatory Genes 
and Uses Thereof", the disclosure of which is hereby in- 
corporated by reference and claims priority therefrom; U.S 
Provisional Patent Application Serial No. 60/441230, filed 
17-Jan-03, entitled "Bioinformatically Detectable Group of 
Novel Vaccinia Regulatory Genes and Uses Thereof is a 
continuation in part of and claims priority from the fol- 
lowing patent applications, the disclosures of which appli- 
cations are all hereby incorporated herein by reference: 
U.S Patent Application Serial No. 10/310188 filed 
5-Dec-02, and U.S. Patent Application Serial No. 
10/303778 filed 26-Nov-02. Both of the aforesaid patent 
applications are entitled "Bioinformatically Detectable 
Group of Novel Viral Regulatory Genes and Uses Thereof; 
U.S Patent Application Serial No. 10/310188, filed 



05-Dec-02, entitled "Bioinformatically Detectable Group 
of Novel Viral Regulatory Genes and Uses Thereof" is a 
continuation in part of U.S Patent Application Serial 
No.10/303778, filed 26-Nov-02, entitled "Bioinformati- 
cally Detectable Group of Novel Viral Regulatory Genes 
and Uses Thereof the disclosure of which is hereby in- 
corporated by reference and claims priority therefrom. 
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Background of the invention 

FIELD OF THE INVENTION 

[0018] The present invention relates to a group of bioinformati- 
cally detectable novel viral oligonucleotides and to a 
group of bioinformatically detectable novel human 
oligonucleotides associated with viral infections, both are 
identified here as "Genomic Address Messenger" (GAM) 
oligonucleotides. 



[0019] All of abovementloned oligonucleotides are believed to be 
related to the microRNA (miRNA) group of oligonu- 
cleotides. 
DESCRIPTION OF PRIOR ART 

[0020] miRNA oligonucleotides are short ~22 nucleotide 

(nt)-long, non-coding, regulatory RNA oligonucleotides 
that are found in a wide range of species. miRNA oligonu- 
cleotides are believed to function as specific gene transla- 
tion repressors and are sometimes involved in cell differ- 
entiation. 

[0021] The ability to detect novel miRNA oligonucleotides is lim- 
ited by the methodologies used to detect such oligonu- 
cleotides. All miRNA oligonucleotides identified so far ei- 
ther present a visibly discernable whole body phenotype, 
as do Lin-4 and Let-7 (Wightman.B., Ha,!., and Ruvkun,G., 
Cell 75: 855-862 (1993); Reinhart et al. Nature 403: 
901-906 (2000)), or produce sufficient quantities of RNA 
so as to be detected by standard molecular biological 
techniques. 

[0022] Ninety-three miRNA oligonucleotides have been discov- 
ered in several species (Lau et al.. Science 294: 858-862 
(2001), Lagos-Quintana et al.. Science 294: 853-858 
(2001)) by sequencing a limited number of clones (300 by 



Lau and 100 by Lagos-Quintana) of size -fractionated 
small segments of RNA. miRNAs that were detected in 
these studies therefore represent the more prevalent 
among the mlRNA oligonucleotide family and cannot be 
much rarer than 1% of all small ~20 nt-long RNA oligonu- 
cleotides. 

[0023] The aforementioned studies provide no basis for the de- 
tection of miRNA oligonucleotides which either do not 
present a visually discernable whole body phenotype, or 
are rare (e.g. rarer than 0.1% of all of the size- 
fractionated, ~20 nt-long RNA segments that were ex- 
pressed in the tissues examined), and therefore do not 
produce large enough quantities of RNA to be detected by 
standard biological techniques. 

[0024] To date, miRNA oligonucleotides have not been detected 
in viruses. 

[0025] The following U.S. Patents relate to bioinformatic detec- 
tion of genes: U.S Patent No. 348935, entitled "Statistical 
algorithms for folding and target accessibility prediction 
and design of nucleic acids", U.S Patent No. 6,369,195, 
entitled "Prostate-specific gene for diagnosis, prognosis 
and management of prostate cancer", and U.S Patent 
No. 6, 291, 666 entitled "Spike tissue-specific promoter", 



each of which is hereby incorporated by reference herein. 
BRIEF DESCRIPTION OF SEQUENCE LISTING, TABLES AND 
COMPUTER PROGRAM LISTING 



[0026] A sequence listing is attached to the present invention, 

comprising 4,204,915 genomic sequences, is contained in 
a file named SEQ_LIST.txt (622912KB, 25-May-04), and is 
hereby incorporated by reference herein. 

[0027] Tables relating to genomic sequences are attached to the 
present application, appear in the following files (size, 
creation date) included on CD, incorporated herein: TA- 
BLE_l.txt (113 MB, 24-May-04), TABLE_2A.txt (619 MB, 
25-May-04), TABLE_2B.txt (515 MB, 25-May-04), TA- 
BLE_3.txt (19.4 MB, 24-May-04), TABLE_4.txt (56.2 MB, 
24-May-04), TABLE_5.txt (12.1 MB, 24-May-04), TA- 
BLE_6.txt (377 MB, 24-May-04), TABLE_7.txt (587 MB, 
24-May-04), TABLE_8_A.txt (619 MB, 24-May-04), TA- 
BLE_8_B.txt (619 MB, 24-May-04), TABLE_8_C.txt (583 
MB, 24-May-04), TABLE_9.txt (3.64 MB, 24-May-04), TA- 
BLE_10.txt (98.5 MB, 24-May-04), and TABLE_ll.txt (79.8 
MB, 25-May-04), all of which are incorporated by refer- 
ence herein. Further, additional tables relating to genomic 
sequences are attached to the present application, appear 
in the following files (size, creation date) attached to the 



application, incorporated herein: TABLE_12.txt (188 KB, 
25-IVIay-04), TABLE_13.txt (140 KB, 25-IVIay-04) and TA- 
BLE_14.txt (39KB, 25-May-04) are incorporated by refer- 
ence herein. 

[0028] A computer program listing constructed and operative in 
accordance with a preferred embodiment of the present 
invention is enclosed on an electronic medium in com- 
puter readable form, and is hereby incorporated by refer- 
ence herein. The computer program listing is contained in 
7 files, the name, sizes and creation date of which are as 
follows: AUXILARY_FILES.txt (117K, 14-Nov-03); 
EDIT_DISTANCE.txt (144K, 24-Nov-03); FIRST-K.txt (96K, 
24-NOV-03); HAIRPIN_PREDICTION.txt (19K, 25-Mar-04); 
TWO_PHASED_SIDE_SELECTOR.txt (4K, 14-Nov-03); 
TWO_PHASED_PREDICTOR.txt (74K, 14-Nov-03), and 
BS_CODE.txt (118K,ll-May-04). 
Summary of the invention 

[0029] The present invention relates to a novel group of 659 

bioinformatically detectable viral regulatory RNA oligonu- 
cleotides, which repress expression of human target 
genes, by means of complementary hybridization to bind- 
ing sites in untranslated regions of these human target 
genes. It is believed that this novel group of viral oligonu- 



cleotides represents a pervasive viral mechanism of at- 
tacl<ing liosts, and tlierefore knowledge of this novel 
group of viral oligonucleotides may be useful in prevent- 
ing and treating viral diseases. 

[0030] Additionally, the present invention relates to a novel 

group of 6272 bioinformatically detectable human regula- 
tory RNA oligonucleotides, which repress expression of 
viral target genes, by means of complementary hybridiza- 
tion to binding sites in untranslated regions of these viral 
target genes. It is believed that this novel group of human 
oligonucleotides represents a pervasive novel anti-viral 
host defense mechanism, and therefore knowledge of this 
novel group of human oligonucleotides may be useful in 
preventing and treating viral diseases. 

[0031] Furthermore, the present invention relates to a novel 
group of 104,504 bioinformatically detectable human 
regulatory RNA oligonucleotides, which repress expres- 
sion of human target genes associated with viral diseases, 
by means of complementary hybridization to binding sites 
in untranslated regions of these human target genes. It is 
believed that this novel group of human oligonucleotides 
represents a pervasive novel host response mechanism, 
and therefore knowledge of this novel group of human 



oligonucleotides may be useful in preventing and treating 
viral diseases. 

[0032] Additionally, the present invention relates to a novel 

group of 1,406 bioinformatically detectable viral regula- 
tory RNA oligonucleotides, which repress expression of 
viral target genes, by means of complementary hybridiza- 
tion to binding sites in untranslated regions of these viral 
target genes. It is believed that this novel group of viral 
oligonucleotides represents a pervasive novel internal viral 
regulation mechanism, and therefore knowledge of this 
novel group of viral oligonucleotides may be useful in 
preventing and treating viral diseases. 

[0033] Also disclosed are 190 novel microRNA-cluster like viral 
polynucleotides and 14,813 novel microRNA-cluster like 
human polynucleotides, both referred to here as Genomic 
Record (GR) polynucleotides. 

[0034] In various preferred embodiments, the present invention 
seeks to provide improved method and system for detec- 
tion and prevention of viral diseases, which are mediated 
by the abovementioned groups of novel oligonucleotides. 

[0035] Accordingly, the invention provides several substantially 
pure nucleic acids (e.g., genomic DNA, cDNA or synthetic 
DNA) each comprising a novel GAM oligonucleotide, vec- 



tors comprising the DMAs, probes comprising tlie DMAs, a 
metliod and system for selectively modulating translation 
of known target genes utilizing the vectors, and a method 
and system utilizing the GAM probes to modulate expres- 
sion of target genes. 

[0036] The present invention represents a scientific break- 
through, disclosing novel mlRNA-like oligonucleotides the 
number of which is dramatically larger than previously be- 
lieved existed. Prior-art studies reporting miRNA oligonu- 
cleotides ((Lau et al.. Science 294:858-862 (2001), Lagos- 
Quintana et al.. Science 294: 853-858 (2001)) discovered 
93 mlRNA oligonucleotides in several species, including 
21 in human, using conventional molecular biology meth- 
ods, such as cloning and sequencing. 

[0037] Molecular biology methodologies employed by these 
studies are limited in their ability to detect rare miRNA 
oligonucleotides, since these studies relied on sequencing 
of a limited number of clones (300 clones by Lau and 100 
clones by Lagos-Quintana) of small segments (i.e. size- 
fractionated) of RNA. miRNA oligonucleotides detected in 
these studies therefore, represent the more prevalent 
among the miRNA oligonucleotide family, and are typically 
not be much rarer than 1% of all small ~20 nt-long RNA 



oligonucleotides present in the tissue from the RNA was 
extracted. 

[0038] Recent studies state the number of miRNA oligonu- 
cleotides to be limited, and describe the limited sensitivity 
of available methods for detection of miRNA oligonu- 
cleotides: "The estimate of 255 human miRNA oligonu- 
cleotides is an upper bound implying that no more than 
40 miRNA oligonucleotides remain to be identified in 
mammals" (Lim et al., Science, 299:1540 (2003)); "Esti- 
mates place the total number of vertebrate miRNA genes 
at about 200-250" (Ambros et al. Curr. Biol. 13:807-818 
(2003)); and "Confirmation of very low abundance mlRNAs 
awaits the application of detection methods more sensi- 
tive than Northern blots" (Ambros et al. Curr. Biol. 
13:807-818 (2003)). 

[0039] The oligonucleotides of the present invention represent a 
revolutionary new dimension of genomics and of biology: 
a dimension comprising a huge number of non-pro- 
tein-coding oligonucleotides which modulate expression 
of thousands of proteins and are associated with numer- 
ous major diseases. This new dimension disclosed by the 
present invention dismantles a central dogma that has 
dominated life-sciences during the past 50 years, a 



dogma which has emphasized the importance of protein- 
coding regions of the genome, holding non-pro- 
tein-coding regions to be of little consequence, often 
dubbing them "junk DNA". 

[0040] Indeed, only in November, 2003 has this long held belief 
as to the low importance of non-protein-coding regions 
been vocally challenged. As an example, an article titled 
"The Unseen Genome - Gems in the Junk" (Gibbs, W.W. 
Sci. Am. 289:46-53 (2003)) asserts that the failure to rec- 
ognize the importance of non-protein- coding regions 
"may well go down as one of the biggest mistakes in the 
history of molecular biology." Gibbs further asserts that 
"what was damned as junk because it was not understood, 
may in fact turn out to be the very basis of human com- 
plexity." The present invention provides a dramatic leap in 
understanding specific important roles of non-pro- 
tein-coding regions. 

[0041] An additional scientific breakthrough of the present in- 
vention is a novel conceptual model disclosed by the 
present invention, which conceptual model is preferably 
used to encode in a genome the determination of cell dif- 
ferentiation, utilizing oligonucleotides and polynu- 
cleotides of the present invention. 



[0042] Using the bioinformatic engine of the present invention, 

1,655 viral GAIVI oligonucleotides and their respective pre- 
cursors and targets have been detected and 105,537 hu- 
man CAM oligonucleotides and their respective precursors 
and targets have been detected. These bioinformatic pre- 
dictions are supported by robust biological studies. These 
bioinformatic predictions are supported by robust biolog- 
ical studies. Microarray experiments validated expression 
of 1,637 of the human GAM oligonucleotides of the 
present invention. Of these, 938 received an extremely 
high score: over six standard deviations higher than the 
background "noise" of the microarray, and over two stan- 
dard deviations above their individual "mismatch" control 
probes and 69 received a high score: over four standard 
deviations higher than the background "noise" of the mi- 
croarray. Further, 38 GAM oligonucleotides were se- 
quenced. 

[0043] In various preferred embodiments, the present invention 
seeks to provide an improved method and system for 
specific modulation of the expression of specific target 
genes involved in significant human diseases. It also pro- 
vides an improved method and system for detection of the 
expression of novel oligonucleotides of the present inven- 



tion, which modulate these target genes. In many cases, 
the target genes may be known and fully characterized, 
however in alternative embodiments of the present inven- 
tion, unknown or less well characterized genes may be 
targeted. 

[0044] A "Nucleic acid" is defined as a ribonucleic acid (RNA) 

molecule, or a deoxyribonucleic acid (DNA) molecule, or 
complementary deoxyribonucleic acid (cDNA), comprising 
either naturally occurring nucleotides or non-naturally oc- 
curring nucleotides. 

[0045] "Substantially pure nucleic acid", "Isolated Nucleic Acid", 
"Isolated Oligoucleotide" and "Isolated Polynucleotide" are 
defined as a nucleic acid that is free of the genome of the 
organism from which the nucleic acid is derived, and in- 
clude, for example, a recombinant nucleic acid which is 
incorporated into a vector, into an autonomously replicat- 
ing plasmid or virus, or into the genomic nucleic acid of a 
prokaryote or eukaryote at a site other than its natural 
site; or which exists as a separate molecule (e.g., a cDNA 
or a genomic or cDNA fragment produced by PGR or re- 
striction endonuclease digestion) independent of other 
nucleic acids. 

[0046] An "Oligonucleotide" is defined as a nucleic acid compris- 



ing 2-139 nts, or preferably 16-120 nts. A "Polynu- 
cleotide" is defined as a nucleic acid comprising 
140-5000 nts, or preferably 140-1000 nts. 

[0047] A "Complementary" sequence is defined as a first nu- 
cleotide sequence which reverses complementary of a 
second nucleotide sequence: the first nucleotide sequence 
is reversed relative to a second nucleotide sequence, and 
wherein each nucleotide in the first nucleotide sequence is 
complementary to a corresponding nucleotide in the sec- 
ond nucleotide sequence (e.g. ATGGC is the complemen- 
tary sequence of GCCAT). 

[0048] "Hybridization", "Binding" and "Annealing" are defined as 
hybridization, under in vivo physiological conditions, of a 
first nucleic acid to a second nucleic acid, which second 
nucleic acid is at least partially complementary to the first 
nucleic acid. 

[0049] A "Hairpin Structure" is defined as an oligonucleotide hav- 
ing a nucleotide sequence that is 50-140 nts in length, 
the first half of which nucleotide sequence is at least par- 
tially complementary to the second part thereof, thereby 
causing the nucleic acid to fold onto itself, forming a sec- 
ondary hairpin structure. 

[0050] A "Hairpin-Shaped Precursor" is defined as a Hairpin 



Structure which is processed by a Dicer enzyme complex, 
yielding an oligonucleotide which is about 19 to about 24 
nts in length. 

[0051] "Inhibiting translation" is defined as the ability to prevent 
synthesis of a specific protein encoded by a respective 
gene by means of inhibiting the translation of the mRNA 
of this gene. For example, inhibiting translation may in- 
clude the following steps: (1) a DNA segment encodes an 
RNA, the first half of whose sequence is partially comple- 
mentary to the second half thereof; (2) the precursor folds 
onto itself forming a hairpin-shaped precursor; (3) a Dicer 
enzyme complex cuts the hairpin-shaped precursor yield- 
ing an oligonucleotide that is approximately 22 nt in 
length; (4) the oligonucleotide binds complementarily to 
at least one binding site, having a nucleotide sequence 
that is at least partially complementary to the oligonu- 
cleotide, which binding site is located in the mRNA of a 
target gene, preferably in the untranslated region (UTR) of 
a target gene, such that the binding inhibits translation of 
the target protein. 

[0052] A "Translation inhibitor site" is defined as the minimal nu- 
cleotide sequence sufficient to inhibit translation. 

[0053] The present invention describes novel GAM oligonu- 



cleotides, detected using a bioinformatic engine described 
liereinabove. Tlie ability of this detection engine lias been 
demonstrated using stringent algorithmic criteria, show- 
ing that the engine has both high sensitivity, indicated by 
the high detection rate of published mlRNA oligonu- 
cleotides and their targets, as well as high specificity, in- 
dicated by the low amount of "background" hairpin candi- 
dates passing its filters. Laboratory tests, based both on 
sequencing of predicted GAM oligonucleotides and on mi- 
croarray experiments, validated 1672 of the GAM oligonu- 
cleotides in the present invention. Further, almost all of 
the viral target genes (2,055 of the 2,195) and almost all 
of the human target genes (588 out of 657) described in 
the present invention are bound by one or more of the 
1672 human GAM oligonucleotides validated by the mi- 
croarray experiments. 
[0054] There is thus provided in accordance with a preferred em- 
bodiment of the present invention a bioinformatically de- 
tectable isolated oligonucleotide which is endogenously 
processed from a hairpin-shaped precursor, and anneals 
to a portion of a mRNA transcript of a target gene, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 



wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs: 1-1672 and 
1673-119264. 

[0055] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide having a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs: 1-1672 and 1673-119264. 

[0056] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable first oligonucleotide which is a por- 
tion of a mRNA transcript of a target gene, and anneals to 
a second oligonucleotide that is endogenously processed 
from a hairpin precursor, wherein binding of the first 
oligonucleotide to the second oligonucleotide represses 
expression of the target gene, and wherein nucleotide se- 
quence of the second nucleotide is selected from the 
group consisting of SEQ ID NOs: 1-1672 and 
1673-119264. 

[0057] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable oligonucleotide having a nucleotide 



sequence selected from the group consisting of SEQ ID 
NOs: 3362235-4097720. 

[0058] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with B19 virus infection, wherein binding of the 
oligonucleotide to the mRNA transcript represses expres- 
sion of the target gene, and wherein the oligonucleotide 
has at least 80% sequence identity with a nucleotide se- 
quence selected from the group consisting of SEQ ID NOs 
shown in Table 13 row 2. 

[0059] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Barmah Forest virus infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 3. 

[0060] There is still further provided in accordance with another 



preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with BK polyomavirus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 4. 

[0061] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Bunyamwera virus infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 5. 

[0062] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 



associated with Colorado ticl< fever virus infection, 
wlierein binding of tlie oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 6. 
[0063] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Crimean-Congo hemorrhagic fever virus 
infection, wherein binding of the oligonucleotide to the 
mRNA transcript represses expression of the target gene, 
and wherein the oligonucleotide has at least 80% se- 
quence identity with a nucleotide sequence selected from 
the group consisting of SEQ ID NOs shown in Table 13 
row 7. 

[0064] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Dengue virus infection, wherein binding of 
the oligonucleotide to the mRNA transcript represses ex- 



pression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 8. 

[0065] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Dobrava virus infection, wherein binding 
of the oligonucleotide to the mRNA transcript represses 
expression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 9. 

[0066] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Eastern equine encephalitis virus infec- 
tion, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 



group consisting of SEQ ID NOs sliown in Table 13 row 
10. 

[0067] There is furtlier provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Hepatitis A virus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 11. 

[0068] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Hepatitis B virus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 12. 

[0069] There is additionally provided in accordance with another 



preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Hepatitis C virus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 13. 

[0070] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Hepatitis D virus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 14. 

[0071] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 



associated with Hepatitis E virus infection, wlierein bind- 
ing of tlie oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 15. 

[0072] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human adenovirus A infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 16. 

[0073] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human adenovirus B (HAdV-B) infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 



wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
17. 

[0074] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human adenovirus C infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 18. 

[0075] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human adenovirus D infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 



of SEQ ID NOs shown in Table 13 row 19. 

[0076] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human adenovirus E infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 20. 

[0077] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human adenovirus F infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 21. 

[0078] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 



matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human astrovirus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 22. 
[0079] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human coronavirus 229E infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
23. 

[0080] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 



associated with Human coronavirus OC43 (HCoV-OC43) 
infection, wlierein binding of tlie oligonucleotide to the 
mRNA transcript represses expression of the target gene, 
and wherein the oligonucleotide has at least 80% se- 
quence identity with a nucleotide sequence selected from 
the group consisting of SEQ ID NOs shown in Table 13 
row 24. 

[0081] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human echovirus 1 infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 25. 

[0082] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human enterovirus A infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 



presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 26. 

[0083] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human enterovirus B infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 27. 

[0084] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human enterovirus C infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 



of SEQ ID NOs shown in Table 13 row 28. 

[0085] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human enterovirus D infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 29. 

[0086] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human enterovirus E infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 30. 

[0087] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 



matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human erythrovirus V9 infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 31. 

[0088] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human herpesvirus 1 infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 32. 

[0089] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human herpesvirus 10 infection, wherein 



binding of tlie oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 33. 

[0090] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human herpesvirus 2 infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 34. 

[0091] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human herpesvirus 3 infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 



nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 35. 
[0092] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human herpesvirus 4 (Epstein-Barr virus) 
infection, wherein binding of the oligonucleotide to the 
mRNA transcript represses expression of the target gene, 
and wherein the oligonucleotide has at least 80% se- 
quence identity with a nucleotide sequence selected from 
the group consisting of SEQ ID NOs shown in Table 13 
row 36. 

[0093] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human herpesvirus 5 infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 37. 



[0094] There is moreover provided in accordance witli anotlier 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human herpesvirus 6 infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 38. 

[0095] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human herpesvirus 6B infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 39. 

[0096] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 



neals to a portion of a mRNA transcript of a target gene 
associated witli Human herpesvirus 7 infection, wlierein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 40. 

[0097] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human herpesvirus 9 infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 41. 

[0098] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human immunodeficiency virus 1 (HIV-1) 
infection, wherein binding of the oligonucleotide to the 



mRNA transcript represses expression of tlie target gene, 
and wlierein tlie oligonucleotide has at least 80% se- 
quence identity with a nucleotide sequence selected from 
the group consisting of SEQ ID NOs shown in Table 13 
row 42. 

[0099] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human immunodeficiency virus 2 (HIV-2) 
infection, wherein binding of the oligonucleotide to the 
mRNA transcript represses expression of the target gene, 
and wherein the oligonucleotide has at least 80% se- 
quence identity with a nucleotide sequence selected from 
the group consisting of SEQ ID NOs shown in Table 13 
row 43. 

[0100] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human metapneumovirus infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 



wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
44. 

[0101] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human papillomavirus type 11 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
45. 

[0102] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human papillomavirus type 16 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 



identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
46. 

[0103] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human papillomavirus type 17 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
47. 

[0104] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human papillomavirus type 18 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 



group consisting of SEQ ID NOs sliown in Table 13 row 
48. 

[0105] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human papillomavirus type 18, complete 
genome infection, wherein binding of the oligonucleotide 
to the mRNA transcript represses expression of the target 
gene, and wherein the oligonucleotide has at least 80% 
sequence identity with a nucleotide sequence selected 
from the group consisting of SEQ ID NOs shown in Table 
13 row 49. 

[0106] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human papillomavirus type 19 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 



50. 

[0107] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human papillomavirus type 31 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
51. 

[0108] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human papillomavirus type 45 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
52. 



[0109] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human papillomavirus type 5 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
53. 

[0110] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human papillomavirus type 6 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
54. 

[0^^^] There is further provided in accordance with another pre- 



ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human papillomavirus type 8 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
55. 

[0112] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human parainfluenza virus 1 strain Wash- 
ington/1964 infection, wherein binding of the oligonu- 
cleotide to the mRNA transcript represses expression of 
the target gene, and wherein the oligonucleotide has at 
least 80% sequence identity with a nucleotide sequence 
selected from the group consisting of SEQ ID NOs shown 
in Table 13 row 56. 

[0113] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 



matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human parainfluenza virus 2 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
57. 

[0114] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human parainfluenza virus 3 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
58. 

[0115] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 



neals to a portion of a mRNA transcript of a target gene 
associated witli Human parechovirus 2 infection, wlierein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 59. 
[0116] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human respiratory syncytial virus infec- 
tion, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
60. 

[0117] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human rhinovirus 89 infection, wherein 



binding of tlie oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 61. 

[0118] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human rhinovirus B infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 62. 

[0119] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human T-lymphotropic virus 1 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 



identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
63. 

[0120] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Human T-lymphotropic virus 2 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
64. 

[0121] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Influenza A virus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 



of SEQ ID NOs shown in Table 13 row 65. 

[0122] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Influenza B virus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 66. 

[0123] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Japanese encephalitis virus infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
67. 

[0124] There is still further provided in accordance with another 



preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated withJC virus infection, wherein binding of the 
oligonucleotide to the mRNA transcript represses expres- 
sion of the target gene, and wherein the oligonucleotide 
has at least 80% sequence identity with a nucleotide se- 
quence selected from the group consisting of SEQ ID NOs 
shown in Table 13 row 68. 

[0125] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Machupo virus infection, wherein binding 
of the oligonucleotide to the mRNA transcript represses 
expression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 69. 

[0126] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 



associated with IVIarburg virus infection, wlierein binding 
of tlie oligonucleotide to the mRNA transcript represses 
expression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 70. 

[0127] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Measles virus infection, wherein binding 
of the oligonucleotide to the mRNA transcript represses 
expression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 71. 

[0128] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Molluscum contagiosum virus infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 



wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
72. 

[0129] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Murray Valley encephalitis virus infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
73. 

[0130] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Norwalk virus infection, wherein binding 
of the oligonucleotide to the mRNA transcript represses 
expression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 



cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 74. 

[0131] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Poliovirus infection, wherein binding of 
the oligonucleotide to the mRNA transcript represses ex- 
pression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 75. 

[0132] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Puumala virus infection, wherein binding 
of the oligonucleotide to the mRNA transcript represses 
expression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 76. 

[0133] There is additionally provided in accordance with another 



preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Respiratory syncytial virus infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
77. 

[0134] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Reston Ebola virus (REBOV) infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
78. 

[0135] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 



matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Rubella virus infection, wherein binding of 
the oligonucleotide to the mRNA transcript represses ex- 
pression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 79. 

[0136] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with SARS coronavirus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 80. 

[0137] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Seoul virus infection, wherein binding of 



the oligonucleotide to the mRNA transcript represses ex- 
pression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 81. 

[0138] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Sin Nombre virus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 82. 

[0139] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Tula virus infection, wherein binding of 
the oligonucleotide to the mRNA transcript represses ex- 
pression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 



cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 83. 

[0140] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Uukuniemi virus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 84. 

[0141] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Vaccinia virus infection, wherein binding 
of the oligonucleotide to the mRNA transcript represses 
expression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 85. 

[0142] There is moreover provided in accordance with another 



preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Variola virus infection, wherein binding of 
the oligonucleotide to the mRNA transcript represses ex- 
pression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 86. 

[0143] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with West Nile virus infection, wherein binding 
of the oligonucleotide to the mRNA transcript represses 
expression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 87. 

[0144] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 



associated with Western equine encephalomyelitis virus 
infection, wherein binding of the oligonucleotide to the 
mRNA transcript represses expression of the target gene, 
and wherein the oligonucleotide has at least 80% se- 
quence identity with a nucleotide sequence selected from 
the group consisting of SEQ ID NOs shown in Table 13 
row 88. 

[0145] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Yellow fever virus infection, wherein bind- 
ing of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 89. 

[0146] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Zaire Ebola virus (ZEBOV) infection, 
wherein binding of the oligonucleotide to the mRNA tran- 



script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
90. 

[0147] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a method for 
treatment of a disease involving a tissue in which a pro- 
tein is pathologically expressed to an undesirable extent, 
the protein having a messenger RNA, the method includ- 
ing: providing a material which modulates activity of a mi- 
croRNA oligonucleotide which binds complementarily to a 
segment of the messenger RNA, and introducing the ma- 
terial into the tissue, causing modulation of the activity of 
the microRNA oligonucleotide and thereby modulating ex- 
pression of the protein in a desired manner. 

[0148] There is still further provided in accordance with another 
preferred embodiment of the present invention a method 
for treatment of a disease involving tissue in which a pro- 
tein is pathologically expressed to an undesirable extent, 
the protein having a messenger RNA, the method includ- 
ing: providing a material which at least partially binds a 
segment of the messenger RNA that is bound comple- 



mentarily by a microRNA oligonucleotide, thereby modu- 
lating expression of the protein, and introducing the ma- 
terial into the tissue, thereby modulating expression of 
the protein. 

[0149] There is additionally provided in accordance with another 
preferred embodiment of the present invention a method 
for treatment of a disease involving a tissue in which a 
protein is pathologically over-expressed, the protein hav- 
ing a messenger RNA, the method including: providing a 
microRNA oligonucleotide which binds complementarily to 
a segment of the messenger RNA, and introducing the mi- 
croRNA oligonucleotide into the tissue, causing the mi- 
croRNA oligonucleotide to bind complementarily to a seg- 
ment of the messenger RNA and thereby inhibit expres- 
sion of the protein. 

[0150] There is moreover provided in accordance with another 
preferred embodiment of the present invention a method 
for treatment of a disease involving a tissue in which a 
protein is pathologically over-expressed, the protein hav- 
ing a messenger RNA, the method including: providing a 
chemically-modified microRNA oligonucleotide which 
binds complementarily to a segment of the messenger 
RNA, and introducing the chemically-modified microRNA 



oligonucleotide into the tissue, causing the microRNA 
oligonucleotide to bind complementarily to a segment of 
the messenger RNA and thereby inhibit expression of the 
protein. 

[0151] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a method for 
treatment of a disease involving a tissue in which a pro- 
tein is pathologically under-expressed, the protein having 
a messenger RNA, the method including: providing an 
oligonucleotide that inhibits activity of a microRNA 
oligonucleotide which binds complementarily to a seg- 
ment of the messenger RNA, and introducing the oligonu- 
cleotide into the tissue, causing inhibition of the activity 
of the microRNA oligonucleotide and thereby promotion 
of translation of the protein. 

[0152] There is still further provided in accordance with another 
preferred embodiment of the present invention a method 
for treatment of a disease involving a tissue in which a 
protein is pathologically under-expressed, the protein 
having a messenger RNA, the method including: providing 
a chemically-modified oligonucleotide that inhibits activ- 
ity of a microRNA oligonucleotide which binds comple- 
mentarily to a segment of the messenger RNA, and intro- 



ducing the chemically- modified oligonucleotide into the 
tissue, causing inhibition of the activity of the microRNA 
oligonucleotide and thereby promotion of translation of 
the protein. 

[0153] There is additionally provided in accordance with another 
preferred embodiment of the present invention a method 
for diagnosis of a disease involving a tissue in which a 
protein is expressed to abnormal extent, the protein hav- 
ing a messenger RNA, the method including: assaying a 
microRNA oligonucleotide which at least partially binds a 
segment of the messenger RNA and modulates expression 
of the protein, thereby providing an indication of at least 
one parameter of the disease. 

[0154] There is moreover provided in accordance with another 
preferred embodiment of the present invention a method 
for detection of expression of an oligonucleotide, the 
method including: determining a first nucleotide sequence 
of a first oligonucleotide, which first nucleotide sequence 
is not complementary to a genome of an organism, re- 
ceiving a second nucleotide sequence of a second 
oligonucleotide whose expression is sought to be de- 
tected, designing a third nucleotide sequence that is com- 
plementary to the second nucleotide sequence of the sec- 



ond oligonucleotide, and a fourth nucleotide sequence 
that is complementary to a fifth nucleotide sequence 
which is different from the second nucleotide sequence of 
the second oligonucleotide by at least one nucleotide, 
synthesizing a first oligonucleotide probe having a sixth 
nucleotide sequence including the third nucleotide se- 
quence followed by the first nucleotide sequence of the 
first oligonucleotide, and a second oligonucleotide probe 
having a seventh nucleotide sequence including the fourth 
nucleotide sequence followed by the first nucleotide se- 
quence of the first oligonucleotide, locating the first 
oligonucleotide probe and the second oligonucleotide 
probe on a microarray platform, receiving an RNA test 
sample from at least one tissue of the organism, obtaining 
size-fractionated RNA from the RNA test sample, amplify- 
ing the size-fractionated RNA, hybridizing the adaptor- 
linked RNA with the first and second oligonucleotide 
probes on the microarray platform, and determining ex- 
pression of the first oligonucleotide in the at least one tis- 
sue of the organism, based at least in part on the hy- 
bridizing. 

[0155] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 



matically detectable isolated polynucleotide wliicli is en- 
dogenously processed into a plurality of hairpin-shaped 
precursor oligonucleotides, each of which is endogenously 
processed into a respective oligonucleotide, which in turn 
anneals to a portion of a mRNA transcript of a target 
gene, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene. 

[0156] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which is en- 
dogenously processed from a hairpin-shaped precursor, 
and anneals to a portion of a mRNA transcript of a target 
gene, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the target gene does not encode a protein. 

[0157] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which is en- 
dogenously processed from a hairpin-shaped precursor, 
and anneals to a portion of a mRNA transcript of a target 
gene, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein a function of the oligonucleotide includes modu- 



lation of cell type. 

[0158] There is moreover provided in accordance witli anotlier 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which is en- 
dogenously processed from a hairpin-shaped precursor, 
and anneals to a portion of a mRNA transcript of a target 
gene, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the oligonucleotide is maternally transferred by a 
cell to at least one daughter cell of the cell, and a function 
of the oligonucleotide includes modulation of cell type of 
the daughter cell. 

[0159] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a method for 
bioinformatic detection of microRNA oligonucleotides, the 
method including: bioinformatically detecting a hairpin- 
shaped precursor oligonucleotide, bioinformatically de- 
tecting an oligonucleotide which is endogenously pro- 
cessed from the hairpin-shaped precursor oligonu- 
cleotide, and bioinformatically detecting a target gene of 
the oligonucleotide wherein the oligonucleotide anneals to 
at least one portion of a mRNA transcript of the target 
gene, and wherein the binding represses expression of 



the target gene, and the target gene is associated with a 
disease. 
Brief Description of Drawings 

[0160] Fig. 1 is a simplified diagram illustrating a mode by which 
an oligonucleotide of a novel group of oligonucleotides of 
the present invention modulates expression of known tar- 
get genes; 

[0161] Fig. 2 is a simplified block diagram illustrating a bioinfor- 
matic oligonucleotide detection system capable of detect- 
ing oligonucleotides of the novel group of oligonu- 
cleotides of the present invention, which system is con- 
structed and operative in accordance with a preferred em- 
bodiment of the present invention; 

[0162] Fig. 3 is a simplified flowchart illustrating operation of a 
mechanism for training of a computer system to recog- 
nize the novel oligonucleotides of the present invention, 
which mechanism is constructed and operative in accor- 
dance with a preferred embodiment of the present inven- 
tion; 

[0163] Fig. 4A is a simplified block diagram of a non-coding ge- 
nomic sequence detector constructed and operative in ac- 
cordance with a preferred embodiment of the present in- 
vention; 



[0164] pjg_ 4B is a simplified flowcliart illustrating operation of a 
non-coding genomic sequence detector constructed and 
operative in accordance with a preferred embodiment of 
the present invention; 

[0165] Fig. 5A is a simplified block diagram of a hairpin detector 
constructed and operative in accordance with a preferred 
embodiment of the present invention; 

[0166] Fig. 5B is a simplified flowchart illustrating operation of a 
hairpin detector constructed and operative in accordance 
with a preferred embodiment of the present invention; 

[0167] Fig. 6A is a simplified block diagram of a Dicer-cut loca- 
tion detector constructed and operative in accordance 
with a preferred embodiment of the present invention; 

[0168] Fig. 6B is a simplified flowchart illustrating training of a 
Dicer-cut location detector constructed and operative in 
accordance with a preferred embodiment of the present 
invention; 

[0169] Fig. 6C is a simplified flowchart illustrating operation of a 
Dicer-cut location detector constructed and operative in 
accordance with a preferred embodiment of the present 
invention; 

[0170] Fig. 7A is a simplified block diagram of a target gene 

binding site detector constructed and operative in accor- 



dance with a preferred embodiment of tlie present inven- 
tion; 

[0171] Fig. 7B is a simplified flowchart illustrating operation of a 
target gene binding site detector constructed and opera- 
tive in accordance with a preferred embodiment of the 
present invention; 

[0172] Fig. 8 is a simplified flowchart illustrating operation of a 
function and utility analyzer constructed and operative in 
accordance with a preferred embodiment of the present 
invention; 

[0173] Fig. 9 is a simplified diagram describing a novel bioinfor- 
matically-detected group of regulatory polynucleotides, 
referred to here as Genomic Record (GR) polynucleotides, 
each of which encodes an "operon-like" cluster of novel 
microRNA-like oligonucleotides, which in turn modulate 
expression of one or more target genes; 

[0174] Fig. 10 is a block diagram illustrating different utilities of 
novel oligonucleotides and novel operon-like polynu- 
cleotides, both of the present invention; 

[0175] Figs. IIA and IIB are simplified diagrams which, when 
taken together, illustrate a mode of oligonucleotide ther- 
apy applicable to novel oligonucleotides of the present in- 
vention; 



[0176] Fig. 12A is a bar graph illustrating performance results of 
a hairpin detector constructed and operative in accor- 
dance with a preferred embodiment of the present inven- 
tion; 

[0177] Fig. 12B is a line graph illustrating accuracy of a Dicer-cut 
location detector constructed and operative in accordance 
with a preferred embodiment of the present invention; 

[0178] Fig. 12C is a bar graph illustrating performance results of 
the target gene binding site detector 118, constructed and 
operative in accordance with a preferred embodiment of 
the present invention. 

[0179] Fig. 13 is a summary table of laboratory results validating 
expression of novel human oligonucleotides detected by a 
bioinformatic oligonucleotide detection engine con- 
structed and operative in accordance with a preferred em- 
bodiment of the present invention, thereby validating its 
efficacy; 

[0180] Fig. 14A is a schematic representation of an "operon-like" 
cluster of novel human hairpin sequences detected by a 
bioinformatic oligonucleotide detection engine con- 
structed and operative in accordance with a preferred em- 
bodiment of the present invention, and non-GAIVl hairpin 
sequences used as negative controls thereto; 



[0181] Fig. 14B is a schematic representation of secondary fold- 
ing of hairpins of the operon-like cluster of Fig. 14A; 

[0182] Fig. 14C is a picture of laboratory results demonstrating 
expression of novel oligonucleotides of Figs. 14A and 14B 
and lack of expression of the negative controls, thereby 
validating efficacy of bioinformatic detection of GAM 
oligonucleotides and GR polynucleotides detected by a 
bioinformatic oligonucleotide detection engine, con- 
structed and operative in accordance with a preferred em- 
bodiment of the present invention; 

[0183] Fig. 15A is an annotated sequence of EST72223 compris- 
ing known human microRNA oligonucleotide MIR98 and 
novel human oligonucleotide GAM25 PRECURSOR detected 
by the oligonucleotide detection system of the present in- 
vention; and 

[0184] Figs. 15B, 15C and 15D are pictures of laboratory results 
demonstrating laboratory confirmation of expression of 
known human oligonucleotide MIR98 and of novel bioin- 
formatically-detected human GAM25 RNA respectively, 
both of Fig. 15A, thus validating the bioinformatic 
oligonucleotide detection system of the present invention; 

[0185] Fig. 16A, 16B and 16C are schematic diagrams which, 
when taken together, represent methods of designing 



primers to identify specific liairpin oligonucleotides in ac- 
cordance with a preferred embodiment of the present in- 
vention. 

[0186] Fig. 17A is a simplified flowchart illustrating construction 
of a microarray constructed and operative to identify novel 
oligonucleotides of the present invention, in accordance 
with a preferred embodiment of the present invention; 

[0187] Fig. 17B is a simplified block diagram illustrating design 
of a microarray constructed and operative to identify novel 
oligonucleotides of the present invention, in accordance 
with a preferred embodiment of the present invention; 

[0188] Fig. 17C is a flowchart illustrating a mode of preparation 
and amplification of a cDNA library in accordance with a 
preferred embodiment of the present invention; 

[0189] Fig. 18A is a line graph showing results of detection of 
known microRNA oligonucleotides and of novel GAM 
oligonucleotides, using a microarray constructed and op- 
erative in accordance with a preferred embodiment of the 
present invention; 

[0190] Fig. 18B is a line graph showing specificity of hybridiza- 
tion of a microarray constructed and operative in accor- 
dance with a preferred embodiment of the present inven- 
tion; and 



[0191] Fig. 18C is a summary table demonstrating detection of 
known microRNA oligonucleotides using a microarray 
constructed and operative in accordance with a preferred 
embodiment of the present invention. 

[0192] Fig. 19 presents pictures of laboratory results demon- 
strating laboratory confirmation of "dicing" of four novel 
bioinformatically-detected HIVl GAM PRECURSORSs into 
their corresponding mature CAM RNAs; 

[0193] Fig. 20 presents pictures of laboratory results demon- 
strating laboratory confirmation of expression of two 
novel bioinformatically-detected Vaccinia GAM precursors, 
herein designated GAM501943 PRECURSORS and 
GAM501981 PRECURSORS. 

[0194] Fig. 21A is a picture of laboratory results demonstrating 
bands demarcating the presence of small, approximately 
60 nt-long oligonucleotides (representing 22 nt-long 
GAM RNA ligated to adaptors) in control H9 cells and H9 
cells infected with the HIV virus, thereby validating that 
HIV infection can alter the levels of efficacy of small, 22 
nt-long oligonucleotides of the present invention; 

[0195] Fig. 2 IB is a table of laboratory results validating expres- 
sion of novel human oligonucleotides in human cells in- 
fected with HIV, that are detected by a bioinformatic 



oligonucleotide detection engine constructed and opera- 
tive in accordance with a preferred embodiment of the 
present invention, thereby validating its efficacy; 
Brief Description of Sequences 

[0196] A Sequence Listing of genomic sequences of the present 
invention designated SEQ ID N0:1 through SEQ ID: 
4,204,915 is attached to this application, and is hereby 
incorporated herein. The genomic listing comprises the 
following nucleotide sequences: nucleotide sequences of 
1,655 viral and 105,537 human GAM precursors of re- 
spective novel oligonucleotides of the present invention; 
nucleotide sequences of 117,017 human and 2,246 viral 
GAM RNA oligonucleotides of respective novel oligonu- 
cleotides of the present invention; and nucleotide se- 
quences of 527,821 human and 197,218 viral target gene 
binding sites of respective novel oligonucleotides of the 
present invention. 
Detailed Description 

[0197] Reference is now made to Fig. 1, which is a simplified dia- 
gram describing a plurality of novel, bioinformatically-de- 
tected oligonucleotide of the present invention referred to 
here as the Genomic Address Messenger (GAM) oligonu- 



cleotide, which modulates expression of respective target 
genes whose function and utility are l<nown in the art. 

[0198] GAIVI is a novel, bioinformatically detectable, regulatory, 
non-protein-coding, mlRNA-like oligonucleotide. The 
method by which GAM is detected is described with addi- 
tional reference to Figs. 1-8. 

[0199] The CAM PRECURSOR is preferably encoded by a viral 

genome. Alternatively or additionally, the CAM PRECUR- 
SOR is encoded by the human genome. The CAM TARCET 
CENE is a gene encoded by the human genome. Alterna- 
tively or additionally, the CAM TARCET CENE is a gene en- 
coded by a viral genome. 

[0200] The CAM PRECURSOR encodes a CAM PRECURSOR RNA. 
Similar to other mlRNA oligonucleotides, the CAM PRE- 
CURSOR RNA does not encode a protein. 

[0201] GAM PRECURSOR RNA folds onto itself, forming CAM 

FOLDED PRECURSOR RNA, which has a two-dimensional 
"hairpin" structure. As is well-known in the art, this "hair- 
pin structure" is typical of RNA encoded by known mlRNA 
precursor oligonucleotides and is due to the full or partial 
complementarity of the nucleotide sequence of the first 
half of an miRNA precursor to the nucleotide sequence of 
the second half thereof. 



[0202] A complementary sequence is a sequence which is re- 
versed and wherein each nucleotide is replaced by a com- 
plementary nucleotide, as is well known in the art (e.g. 
ATCGC is the complementary sequence of GCCAT). 

[0203] An enzyme complex composed of Dicer RNaselll together 
with other necessary proteins, designated DICER COM- 
PLEX, cuts the CAM FOLDED PRECURSOR RNA yielding a 
single-stranded, ~22 nt-long RNA segment designated 
GAM RNA. 

[0204] CAM TARGET GENE encodes a corresponding messenger 
RNA, designated GAM TARGET RNA. As is typical of mRNA 
of a protein-coding gene, each GAM TARGET RNAs of the 
present invention comprises three regions: a 5' untrans- 
lated region, a protein-coding region and a 3' untrans- 
lated region, designated 5'UTR, PROTEIN-CODING and 
3'UTR, respectively. 

[0205] GAM RNA binds complementarily to one or more target 
binding sites located in the untranslated regions of each 
of the GAM TARGET RNAs of the present invention. This 
complementary binding is due to the partial or full com- 
plementarity between the nucleotide sequence of GAM 
RNA and the nucleotide sequence of each of the target 
binding sites. As an illustration. Fig. 1 shows three such 



target binding sites, designated BINDING SITE I, BINDING 
SITE II and BINDING SITE III, respectively. It is appreciated 
that the number of target binding sites shown in Fig. 1 is 
only illustrative and that any suitable number of target 
binding sites may be present. It is further appreciated that 
although Fig. 1 shows target binding sites only in the 
3'UTR region, these target binding sites may instead be 
located in the 5'UTR region or in both the 3'UTR and 
5'UTR regions. 

[0206] The complementary binding of GAM RNA to target binding 
sites on GAM TARGET RNA, such as BINDING SITE I, BIND- 
ING SITE II and BINDING SITE III, inhibits the translation of 
each of the GAM TARGET RNAs of the present invention 
into respective GAM TARGET PROTEIN, shown surrounded 
by a brol<en line. 

[0207] It is appreciated that the GAM TARGET GENE in fact repre- 
sents a plurality of GAM target genes. The mRNA of each 
one of this plurality of GAM target genes comprises one or 
more target binding sites, each having a nucleotide se- 
quence which is at least partly complementary to GAM 
RNA and which when bound by GAM RNA causes inhibi- 
tion of translation of the CAM target mRNA into a corre- 
sponding GAM target protein. 



[0208] The mechanism of the translational inhibition that is ex- 
erted by GAIVI RNA on one or more GAIVI TARGET GENEs 
may be similar or identical to the known mechanism of 
translational inhibition exerted by known mlRNA oligonu- 
cleotides. 

[0209] The nucleotide sequences of each of a plurality of GAM 

oligonucleotides that are described by Fig. 1 and their re- 
spective genomic sources and genomic locations are set 
forth in Tables 1-3, hereby incorporated herein. 

[0210] The nucleotide sequences of GAM PRECURSOR RNAs, and 
a schematic representation of a predicted secondary fold- 
ing of GAM FOLDED PRECURSOR RNAs, of each of a plural- 
ity of GAM oligonucleotides that are described by Fig. 1 
are set forth in Table 4, hereby incorporated herein. 

[0211] The nucleotide sequences of "diced" GAM RNAs of each of 
a plurality of GAM oligonucleotides that are described by 
Fig. 1 are set forth in Table 5, hereby incorporated herein. 

[0212] The nucleotide sequences of target binding sites, such as 
BINDING SITE I, BINDING SITE II and BINDING SITE III that 
are found on GAM TARGET RNAs of each of a plurality of 
GAM oligonucleotides that are described by Fig. 1, and a 
schematic representation of the complementarity of each 
of these target binding sites to each of a plurality of GAM 



RNAs that are described by Fig. 1 are set fortli in Tables 
6-7, hereby incorporated herein. 

[0213] It is appreciated that the specific functions and accord- 
ingly the utilities of each of a plurality of GAM oligonu- 
cleotides that are described by Fig. 1 are correlated with 
and may be deduced from the identity of the GAM TARGET 
GENES inhibited thereby, and whose functions are set 
forth in Table 8, hereby incorporated herein. 

[0214] Studies documenting the well known correlations between 
each of a plurality of GAM TARGET GENEs that are de- 
scribed by Fig. 1 and the known gene functions and re- 
lated diseases are listed in Table 9, hereby incorporated 
herein. 

[0215] The present invention discloses a novel group of viral and 
human oligonucleotides, belonging to the miRNA-like 
oligonucleotide group, here termed GAM oligonucleotides, 
for which a specific complementary binding has been de- 
termined bioinformatically. 

[0216] Reference is now made to Fig. 2, which is a simplified 

block diagram illustrating a bioinformatic oligonucleotide 
detection system and method constructed and operative 
in accordance with a preferred embodiment of the present 
invention. 



[0217] An important feature of the present invention is a bioin- 
formatic oligonucleotide detection engine 100, which is 
capable of bioinformatically detecting oligonucleotides of 
the present invention. 

[0218] The functionality of the bioinformatic oligonucleotide de- 
tection engine 100 includes receiving expressed RNA data 
102, sequenced DNA data 104, and protein function data 
106; performing a complex process of analysis of this 
data as elaborated hereinbelow, and based on this analy- 
sis provides information, designated by reference numeral 
108, identifying and describing features of novel oligonu- 
cleotides. 

[0219] Expressed RNA data 102 comprises published expressed 
sequence tags (EST) data, published mRNA data, as well as 
other published RNA data. Sequenced DNA data 104 com- 
prises alphanumeric data representing genomic se- 
quences and preferably including annotations such as in- 
formation indicating the location of known protein-coding 
regions relative to the genomic sequences. 

[0220] Protein function data 106 comprises information from sci- 
entific publications e.g. physiological functions of known 
proteins and their connection, involvement and possible 
utility in treatment and diagnosis of various diseases. 



[0221] Expressed RNA data 102 and sequenced DNA data 104 
may preferably be obtained from data published by the 
National Center for Biotechnology Informatiion (NCBI) at 
the National Institute of Health (NIH) Qenuth J.P. (2000). 
Methods Mol. Biol. 132:301-312(2000), herein incorpo- 
rated by reference) as well as from various other pub- 
lished data sources. Protein function data 106 may prefer- 
ably be obtained from any one of numerous relevant pub- 
lished data sources, such as the Online Mendelian Inher- 
ited Disease In Man (OMIM(TM), Hamosh et al.. Nucleic 
Acids Res. 30: 52-55(2002)) database developed by John 
Hopkins University, and also published by NCBI (2000). 

[0222] Prior to or during actual detection of bioinformatically-de- 
tected group of novel oligonucleotides 108 by the bioin- 
formatic oligonucleotide detection engine 100, bioinfor- 
matic oligonucleotide detection engine training & valida- 
tion functionality 110 is operative. This functionality uses 
one or more known mlRNA oligonucleotides as a training 
set to train the bioinformatic oligonucleotide detection 
engine 100 to bioinformatically recognize mlRNA-like 
oligonucleotides, and their respective potential target 
binding sites. Bioinformatic oligonucleotide detection en- 
gine training & validation functionality 110 is further de- 



scribed hereinbelow with reference to Fig. 3. 

[0223] The bioinformatic oligonucleotide detection engine 100 
preferably comprises several modules which are prefer- 
ably activated sequentially, and are described as follows: 

[0224] A non-protein-coding genomic sequence detector 112 
operative to bioinformatically detect non-protein-coding 
genomic sequences. The non-protein-coding genomic se- 
quence detector 112 is further described herein below 
with reference to Figs. 4A and 4B. 

[0225] A hairpin detector 114 operative to bioinformatically de- 
tect genomic "hairpin-shaped" sequences, similar to GAM 
FOLDED PRECURSOR RNA (Fig. 1). The hairpin detector 
114 is further described herein below with reference to 
Figs. 5A and 5B. 

[0226] A Dicer-cut location detector 116 operative to bioinfor- 
matically detect the location on a CAM FOLDED PRECUR- 
SOR RNA which is enzymatically cut by DICER COMPLEX 
(Fig. 1), yielding "diced" GAM RNA. The Dicer-cut location 
detector 116 is further described herein below with refer- 
ence to Figs. 6A-6C. 

[0227] A target gene binding site detector 118 operative to 

bioinformatically detect target genes having binding sites, 
the nucleotide sequence of which is partially complemen- 



tary to that of a given genomic sequence, such as a nu- 
cleotide sequence cut by DICER COMPLEX. The target gene 
binding site detector 118 is further described hereinbelow 
with reference to Figs. 7A and 7B. 

[0228] A function & utility analyzer, designated by reference nu- 
meral 120, is operative to analyze the function and utility 
of target genes in order to identify target genes which 
have a significant clinical function and utility. The function 
& utility analyzer 120 is further described hereinbelow 
with reference to Fig. 8 

[0229] According to an embodiment of the present invention, the 
bioinformatic oligonucleotide detection engine 100 may 
employ a cluster of 40 personal computers (PCs; XEON (R), 
2.8GHz, with 80GB storage each) connected by Ethernet to 
eight servers (2-CPU, XEON (TM) 1.2-2.2CHz, with 
~200GB storage each) and combined with an 8-processor 
server (8-CPU, Xeon 550Mhz w/ 8GB RAM) connected via 
2 HBA fiber-channels to an EMC CLARIION (TM) 
100-disks, 3.6 Terabyte storage device. A preferred em- 
bodiment of the present invention may also preferably 
comprise software that utilizes a commercial database 
software program, such as MICROSOFT (TM) SQL Server 
2000. 



[0230] According to a preferred embodiment of tlie present in- 
vention, tlie bioinformatic oligonucleotide detection en- 
gine 100 may employ a cluster of 80 Servers (XEON (R), 
2.8CHZ, with 80GB storage each) connected by Ethernet to 
eight servers (2-CPU, XEON (TM) 1.2-2.2GHz, with 
~200GB storage each) and combined with storage device 
(Promise Technology Inc., RM8000) connected to an 
8-disks, 2 Terabytes total. A preferred embodiment of the 
present invention may also preferably comprise software 
that utilizes a commercial database software program, 
such as MICROSOFT (TM) SQL Server 2000. It is appreci- 
ated that the abovementioned hardware configuration is 
not meant to be limiting and is given as an illustration 
only. The present invention may be implemented in a wide 
variety of hardware and software configurations. 

[0231] The present invention discloses 1,655 viral and 105,537 
human novel oligonucleotides of the GAM group of 
oligonucleotides, which have been detected bioinformati- 
cally and 190 viral and 14,813 human novel polynu- 
cleotides of the GR group of polynucleotides, which have 
been detected bioinformatically. Laboratory confirmation 
of bioinformatically predicted oligonucleotides of the GAM 
group of oligonucleotides, and several bioinformatically 



predicted polynucleotides of the CR group of polynu- 
cleotides, is described hereinbelow with reference to Figs. 
13-15D, Fig. 18 and Table 12. 

[0232] Laboratory confirmation of bioinformatically predicted 
oligonucleotides of the viral GAM group of oligonu- 
cleotides, and several bioinformatically predicted viral 
polynucleotides of the GR group of polynucleotides, is de- 
scribed hereinbelow with reference to Figs. 19-20. 

[0233] Reference is now made to Fig. 3, which is a simplified 

flowchart illustrating operation of a preferred embodiment 
of the bioinformatic oligonucleotide detection engine 
training & validation functionality 110 described herein- 
above with reference to Fig. 2. 

[0234] bioinformatic oligonucleotide detection engine training & 
validation functionality 110 begins by training the bioin- 
formatic oligonucleotide detection engine 100 (Fig. 2) to 
recognize one or more known miRNA oligonucleotides, as 
designated by reference numeral 122. This training step 
comprises hairpin detector training & validation function- 
ality 124, further described hereinbelow with reference to 
Fig. 5A, Dicer-cut location detector training & validation 
functionality 126, further described hereinbelow with ref- 
erence to Fig. 6A and 6B, and target gene binding site de- 



lector training & validation functionality 128, further de- 
scribed hereinbelow with reference to Fig. 7A. 

[0235] Next, the bioinformatic oligonucleotide detection engine 
training & validation functionality 110 is operative bioin- 
formatically detect novel oligonucleotides, using bioinfor- 
matic oligonucleotide detection engine 100 (Fig. 2), as 
designated by reference numeral 130. Wet lab experi- 
ments are preferably conducted in order to validate ex- 
pression and preferably function of some samples of the 
novel oligonucleotides detected by the bioinformatic 
oligonucleotide detection engine 100, as designated by 
reference numeral 132. Figs. 13A-15D, Fig. 18 and Table 
12 illustrate examples of wet lab validation of sample 
novel human oligonucleotides bioinformatically-detected 
in accordance with a preferred embodiment of the present 
invention. Laboratory confirmation of bioinformatically 
predicted oligonucleotides of the viral GAM group of 
oligonucleotides, and several bioinformatically predicted 
viral polynucleotides of the GR group of polynucleotides, 
is described hereinbelow with reference to Figs. 19-20. 

[0236] Reference is now made to Fig. 4A, which is a simplified 
block diagram of a preferred implementation of the non- 
protein-coding genomic sequence detector 112 described 



hereinabove with reference to Fig. 2. The non-pro- 
tein-coding genomic sequence detector 112 preferably 
receives at least two types of published genomic data: Ex- 
pressed RNA data 102 and sequenced DNA data 104. The 
expressed RNA data 102 may include, inter alia, EST data, 
EST clusters data, EST genome alignment data and mRNA 
data. Sources for expressed RNA data 102 include NCBI 
dbEST, NCBI UniCene clusters and mapping data, and TIGR 
gene indices (Kirkness F. and Kerlavage, A.R., Methods 
Mol. Biol. 69:261-268 (1997)). Sequenced DNA data 104 
may include sequence data (FASTA format files), and fea- 
ture annotations (GenBank file format) mainly from NCBI 
databases. Based on the abovementioned input data, the 
non-protein-coding genomic sequence detector 112 pro- 
duces a plurality of non-protein-coding genomic se- 
quences 136. Preferred operation of the non-pro- 
tein-coding genomic sequence detector 112 is described 
hereinbelow with reference to Fig. 4B. 
[0237] Reference is now made to Fig. 4B, which is a simplified 
flowchart illustrating a preferred operation of the non- 
protein-coding genomic sequence detector 112 of Fig. 2. 
Detection of non-protein-coding genomic sequences 136, 
generally preferably progresses along one of the following 



two paths: 

[0238] A first path for detecting non-protein-coding genomic 

sequences 136 (Fig. 4A) begins with receipt of a plurality 
of known RNA sequences, such as EST data. Each RNA se- 
quence is first compared with l<nown protein-coding DNA 
sequences, in order to select only those RNA sequences 
which are non-protein-coding, i.e. intergenic or intronic 
sequences. This can preferably be performed by using one 
of many alignment algorithms known in the art, such as 
BLAST (Altschul et al.,J. Mol. Biol. 215:403-410 (1990)). 
This sequence comparison preferably also provides local- 
ization of the RNA sequence on the DNA sequences. 

[0239] Alternatively, selection of non-protein-coding RNA se- 
quences and their localization on the DNA sequences can 
be performed by using publicly available EST cluster data 
and genomic mapping databases, such as the UNIGENE 
database published by NCBI or the TIGR database. Such 
databases, map expressed RNA sequences to DNA se- 
quences encoding them, find the correct orientation of 
EST sequences, and indicate mapping of ESTs to protein- 
coding DNA regions, as is well known in the art. Public 
databases, such as TIGR, may also be used to map an EST 
to a cluster of ESTs, known in the art as Tentative Human 



Consensus and assumed to be expressed as one segment. 
Publicly available genome annotation databases, such as 
NCBI's GenBank, may also be used to deduce expressed 
intronic sequences. 

[0240] Optionally, an attempt may be made to "expand" the non- 
protein RNA sequences thus found, by searching for tran- 
scription start and end signals, respectively upstream and 
downstream of the location of the RNA on the DNA, as is 
well known in the art. 

[0241] A second path for detecting non-protein-coding genomic 
sequences 136 (Fig. 4A) begins with receipt of DNA se- 
quences. The DNA sequences are parsed into non- 
protein-coding sequences, using published DNA annota- 
tion data, by extracting those DNA sequences which are 
between known protein-coding sequences. Next, tran- 
scription start and end signals are sought. If such signals 
are found, and depending on their robustness, probable 
expressed non-protein-coding genomic sequences are 
obtained. 

[0242] Such an approach is especially useful for identifying novel 
CAM oligonucleotides which are found in proximity to 
other known miRNA oligonucleotides, or other wet lab 
validated CAM oligonucleotides. Since, as described here- 



inbelow with reference to Fig. 9, QAM oligonucleotides are 
frequently found in clusters; sequences located near 
known miRNA oligonucleotides are more likely to contain 
novel GAM oligonucleotides. Optionally, sequence orthol- 
ogy, i.e. sequence conservation in an evolutionary related 
species, may be used to select genomic sequences having 
a relatively high probability of containing expressed novel 
CAM oligonucleotides. 

[0243] It is appreciated that in detecting non-human GAM 

oligonucleotides of the present invention, the bioinfor- 
matic oligonucleotide detection engine 100 utilizes the 
input genomic sequences, without filtering protein-coding 
regions detected by the non-protein-coding genomic se- 
quence detector 112. Hence, non-protein-coding ge- 
nomic sequences 136 refers to GENOMIC SEQUENCES only. 

[0244] Reference is now made to Fig. 5A, which is a simplified 
block diagram of a preferred implementation of the hair- 
pin detector 114 described hereinabove with reference to 
Fig. 2. 

[0245] The goal of the hairpin detector 114 is to detect hairpin- 
shaped genomic sequences, similar to those of known 
miRNA oligonucleotides. A hairpin-shaped genomic se- 
quence is a genomic sequence, having a first half which is 



at least partially complementary to a second half thereof, 
which causes the halves to folds onto themselves, thereby 
forming a hairpin structure, as mentioned hereinabove 
with reference to Fig. 1. 

[0246] The hairpin detector 114 (Fig. 2) receives a plurality of 
non-protein-coding genomic sequences 136 (Fig. 4A). 
Following operation of hairpin detector training & valida- 
tion functionality 124 (Fig. 3), the hairpin detector 114 is 
operative to detect and output hairpin-shaped sequences, 
which are found in the non-protein-coding genomic se- 
quences 136. The hairpin-shaped sequences detected by 
the hairpin detector 114 are designated hairpin structures 
on genomic sequences 138. A preferred mode of opera- 
tion of the hairpin detector 114 is described hereinbelow 
with reference to Fig. 5B. 

[0247] hairpin detector training & validation functionality 124 in- 
cludes an iterative process of applying the hairpin detec- 
tor 114 to known hairpin-shaped miRNA precursor se- 
quences, calibrating the hairpin detector 114 such that it 
identifies a training set of known hairpin-shaped miRNA 
precursor sequences, as well as other similarly hairpin- 
shaped sequences. In a preferred embodiment of the 
present invention, the hairpin detector training & valida- 



tion functionality 124 trains the hairpin detector 114 and 
validates each of the steps of operation thereof described 
hereinbelow with reference to Fig. 5B 

[0248] The hairpin detector training & validation functionality 

124 preferably uses two sets of data: the aforesaid train- 
ing set of known hairpin-shaped mlRNA precursor se- 
quences, such as hairpin-shaped mlRNA precursor se- 
quences of 440 miRNA oligonucleotides of H. sapiens, M. 
musculus, C. elegans, C. Brigssae and D. Melanogaster, 
annotated in the RFAM database (Griffiths-Jones 2003), 
and a background set of about 1000 hairpin-shaped se- 
quences found in expressed non-protein-coding human 
genomic sequences. The background set is expected to 
comprise some valid, previously undetected hairpin- 
shaped miRNA-like precursor sequences, and many hair- 
pin-shaped sequences which are not hairpin-shaped 
miRNA-like precursors. 

[0249] In a preferred embodiment of the present invention the 
efficacy of the hairpin detector 114 (Fig. 2) is confirmed. 
For example, when a similarity threshold is chosen such 
that 87% of the known hairpin-shaped mlRNA precursors 
are successfully predicted, only 21.8% of the 1000 back- 
ground set of hairpin-shaped sequences are predicted to 



be hairpin-shaped miRNA-lil<e precursors. 

[0250] Reference is now made to Fig. 5B, which is a simplified 
flowchart illustrating preferred operation of the hairpin 
detector 114 of Fig. 2. The hairpin detector 114 preferably 
initially uses a secondary structure folding algorithm 
based on free-energy minimization, such as the MFOLD 
algorithm, described in Mathews et al. J. Mol. Biol. 
288:911-940 (1999) and Zuker, M. Nucleic Acids Res. 31: 
3406-3415 (2003), the disclosure of which is hereby in- 
corporated by reference. This algorithm is operative to 
calculate probable secondary structure folding patterns of 
the non-protein-coding genomic sequences 136 (Fig. 4A) 
as well as the free-energy of each of these probable sec- 
ondary folding patterns. The secondary structure folding 
algorithm, such as the MFOLD algorithm (Mathews, 1997; 
Zuker 2003), typically provides a listing of the base- 
pairing of the folded shape, i.e. a listing of each pair of 
connected nucleotides in the sequence. 

[0251] Next, the hairpin detector 114 analyzes the results of the 
secondary structure folding patterns, in order to deter- 
mine the presence and location of hairpin folding struc- 
tures. The goal of this second step is to assess the base- 
pairing listing provided by the secondary structure folding 



algorithm, in order to determine whetlier tlie base-pairing 
listing describes one or more hairpin type bonding pat- 
tern. Preferably, sequence segment corresponding to a 
hairpin structure is then separately analyzed by the sec- 
ondary structure folding algorithm in order to determine 
its exact folding pattern and free-energy. 

[0252] The hairpin detector 114 then assesses the hairpin struc- 
tures found by the previous step, comparing them to hair- 
pin structures of known mlRNA precursors, using various 
characteristic hairpin structure features such as its free- 
energy and its thermodynamic stability, the amount and 
type of mismatched nucleotides and the existence of se- 
quence repeat-elements, number of mismatched nu- 
cleotides in positions 18-22 counting from loop, and Per- 
cent of G nucleotide. Only hairpins that bear statistically 
significant resemblance to the training set of hairpin 
structures of known miRNA precursors, according to the 
abovementioned parameters, are accepted. 

[0253] In a preferred embodiment of the present invention, simi- 
larity to the training set of hairpin structures of known 
miRNA precursors is determined using a "similarity score" 
which is calculated using a multiplicity of terms, where 
each term is a function of one of the abovementioned 



hairpin structure features. The parameters of each func- 
tion are found heuristically from the set of hairpin struc- 
tures of l<nown miRNA precursors, as described herein- 
above with reference to hairpin detector training & valida- 
tion functionality 124 (Fig. 3). The selection of the fea- 
tures and their function parameters is optimized so as to 
achieve maximized separation between the distribution of 
similarity scores validated miRNA precursor hairpin struc- 
tures, and the distribution of similarity scores of hairpin 
structures detected in the background set mentioned 
hereinabove with reference to Fig. 5B. 
[0254] In an alternative preferred embodiment of the present in- 
vention, the step described in the preceding paragraph 
may be split into two stages. A first stage implements a 
simplified scoring method, typically based on threshold- 
ing a subset of the hairpin structure features described 
hereinabove, and may employ a minimum threshold for 
hairpin structure length and a maximum threshold for 
free-energy. A second stage is preferably more stringent, 
and preferably employs a full calculation of the weighted 
sum of terms described hereinabove. The second stage 
preferably is performed only on the subset of hairpin 
structures that survived the first stage. 



[0255] The hairpin detector 114 also attempts to select hairpin 
structures whose thermodynamic stability is similar to 
that of hairpin structures of known mlRNA precursors. 
This may be achieved in various ways. A preferred em- 
bodiment of the present invention utilizes the following 
methodology, preferably comprising three logical steps: 

[0256] First, the hairpin detector 114 attempts to group hairpin 
structures into "families" of closely related hairpin struc- 
tures. As is known in the art, a secondary structure fold- 
ing algorithm typically provides multiple alternative fold- 
ing patterns, for a given genomic sequence and indicates 
the free-energy of each alternative folding pattern. It is a 
particular feature of the present invention that the hairpin 
detector 114 preferably assesses the various hairpin 
structures appearing in the various alternative folding 
patterns and groups' hairpin structures which appear at 
identical or similar sequence locations in various alterna- 
tive folding patterns into common sequence location 
based "families" of hairpins. For example, all hairpin 
structures whose center is within 7 nucleotides of each 
other may be grouped into a "family". Hairpin structures 
may also be grouped into a "family" if their nucleotide se- 
quences are identical or overlap to a predetermined de- 



gree. 

[0257] It is also a particular feature of the present invention tliat 
tlie liairpin structure "families" are assessed in order to 
select only those families which represent hairpin struc- 
tures that are as thermodynamically stable as those of 
hairpin structures of known mlRNA precursors. Preferably 
only families which are represented in at least a selected 
majority of the alternative secondary structure folding 
patterns, typically 65%, 80% or 100% are considered to be 
sufficiently stable. Our tests suggest that only about 50% 
of the hairpin structures, predicted by the MFOLD algo- 
rithm with default parameters, are members of sufficiently 
stable families, comparing to about 90% of the hairpin 
structures that contain known mlRNAs. This percent de- 
pends on the size of the fraction that was fold. In an alter- 
native embodiment of the present invention we use frac- 
tions of size 1000 nts as preferable size. Different em- 
bodiment uses other sizes of genomics sequences, more 
or less strict demand for representation in the alternative 
secondary structure folding patterns. 

[0258] It is an additional particular feature of the present inven- 
tion that the most suitable hairpin structure is selected 
from each selected family. For example, a hairpin struc- 



ture which has the greatest similarity to the hairpin struc- 
tures appearing in alternative folding patterns of the fam- 
ily may be preferred. Alternatively or additionally, the 
hairpin structures having relatively low free-energy may 
be preferred. 

[0259] Alternatively or additionally considerations of homology to 
hairpin structures of other organisms and the existence of 
clusters of thermodynamically stable hairpin structures 
located adjacent to each other along a sequence may be 
important in selection of hairpin structures. The tightness 
of the clusters in terms of their location and the occur- 
rence of both homology and clusters may be of signifi- 
cance. 

[0260] Reference is now made to Figs. 6A-6C, which together 
describe the structure and operation of the Dicer-cut lo- 
cation detector 116, described hereinabove with reference 
to Fig. 2. 

[0261] Reference is now made to Fig. 6A, which is a simplified 

block diagram of a preferred implementation of the Dicer- 
cut location detector 116. The goal of the Dicer-cut loca- 
tion detector 116 is to detect the location in which the 
DICER COMPLEX, described hereinabove with reference to 
Fig. 1, dices GAM FOLDED PRECURSOR RNA, yielding GAM 



RNA. 

[0262] The Dicer-cut location detector 116 tlierefore receives a 
plurality of hairpin structures on genomic sequences, des- 
ignated by reference numeral 138 (Fig. 5A), and following 
operation of Dicer-cut location detector training & valida- 
tion functionality 126 (Fig 3), is operative to detect a plu- 
rality of Dicer-cut sequences from hairpin structures, des- 
ignated by reference numeral 140. 

[0263] Reference is now made to Fig. 6B, which is a simplified 

flowchart illustrating a preferred implementation of Dicer- 
cut location detector training & validation functionality 
126. 

[0264] A general goal of the Dicer-cut location detector training 
& validation functionality 126 is to analyze the Dicer-cut 
locations of known diced miRNA on respective hairpin- 
shaped miRNA precursors in order to determine a com- 
mon pattern in these locations, which can be used to pre- 
dict Dicer-cut locations on GAM folded precursor RNAs. 

[0265] The Dicer-cut locations of known miRNA precursors are 
obtained and studied. Locations of the 5' and/or 3' ends 
of the known diced miRNA oligonucleotides are preferably 
represented by their respective distances from the 5' end 
of the corresponding hairpin-shaped miRNA precursor. 



Additionally or alternatively, the 5' and/or 3' ends of the 
known diced miRNA oligonucleotides are preferably rep- 
resented by the relationship between their locations and 
the locations of one or more nucleotides along the hair- 
pin-shaped miRNA precursor. Additionally or alternatively, 
the 5' and/or 3' ends of the known diced miRNA oligonu- 
cleotides are preferably represented by the relationship 
between their locations and the locations of one or more 
bound nucleotide pairs along the hairpin-shaped miRNA 
precursor. Additionally or alternatively, the 5' and/or 3' 
ends of the known diced miRNA oligonucleotides are 
preferably represented by the relationship between their 
locations and the locations of one or more mismatched 
nucleotide pairs along the hairpin-shaped miRNA precur- 
sor. Additionally or alternatively, the 5' and/or 3' ends of 
the known diced miRNA oligonucleotides are preferably 
represented by the relationship between their locations 
and the locations of one or more unmatched nucleotides 
along the hairpin-shaped miRNA precursor. Additionally 
or alternatively, locations of the 5' and/or 3' ends of the 
known diced miRNA oligonucleotides are preferably rep- 
resented by their respective distances from the loop lo- 
cated at the center of the corresponding hairpin-shaped 



miRNA precursor. 

[0266] One or more of the foregoing location metrics may be 
employed in the Dicer-cut location detector training & 
validation functionality 126. Additionally, metrics related 
to the nucleotide content of the diced miRNA and/or of 
the hairpin-shaped miRNA precursor may be employed. 

[0267] In a preferred embodiment of the present invention, 

Dicer-cut location detector training & validation function- 
ality 126 preferably employs standard machine learning 
techniques known in the art of machine learning to ana- 
lyze existing patterns in a given "training set" of exam- 
ples. Standard machine learning techniques are capable, 
to a certain degree, of detecting patterns in examples to 
which they have not been previously exposed that are 
similar to those in the "training set". Such machine learn- 
ing techniques include, but are not limited to neural net- 
works, Bayesian Modeling, Bayesian Networks, Support 
Vector Machines (SVM), Genetic Algorithms, Markovian 
Modeling, Maximum Likelihood Modeling, Nearest Neigh- 
bor Algorithms, Decision Trees and other techniques, as is 
well-known in the art. 

[0268] In accordance with an embodiment of the present inven- 
tion, two or more classifiers or predictors based on the 



abovementioned machine learning techniques are sepa- 
rately trained on the abovementioned training set, and are 
used jointly in order to predict the Dicer-cut location. As 
an example, Fig. 6B illustrates operation of two classifiers, 
a 3' end recognition classifier and a 5' end recognition 
classifier. Most preferably, the Dicer-cut location detector 
training & validation functionality 126 implements a "best- 
of-breed" approach employing a pair of classifiers based 
on the abovementioned Bayesian Modeling and Nearest 
Neighbor Algorithms, and accepting only "potential GAM 
RNAs" that score highly on one of these predictors. In this 
context, "high scores" means scores that have been 
demonstrated to have low false positive value when scor- 
ing known mlRNA oligonucleotides. Alternatively, the 
Dicer-cut location detector training & validation function- 
ality 126 may implement operation of more or less than 
two classifiers. 

[0269] Predictors used in a preferred embodiment of the present 
invention are further described hereinbelow with reference 
to Fig. 6C. A computer program listing of a computer pro- 
gram implementation of the Dicer-cut location detector 
training & validation functionality 126 is enclosed on an 
electronic medium in computer-readable form, and is 



hereby incorporated by reference herein. 

[0270] When evaluated on the abovementioned validation set of 
440 published miRNA oligonucleotides using k-fold cross 
validation (Mitchell, 1997) with k = 3, the performance of 
the resulting predictors is as follows: In 70% of known 
miRNA oligonucleotides, a 5' end location is correctly de- 
termined by a Support Vector Machine predictor within up 
to two nucleotides; a Nearest Neighbor (EDIT DISTANCE) 
predictor achieves 56% accuracy (247/440); and a Two- 
Phased Predictor that uses Bayesian modeling (TWO 
PHASED) achieves 80% accuracy (352/440) when only the 
first phase is used. When the second phase (strand choice) 
is implemented by a naive Bayesian model, the accuracy is 
55% (244/440), and when the K-nearest-neighbor model- 
ing is used for the second phase, 374/440 decisions are 
made and the accuracy is 65% (242/374). A K-near- 
est-neighbor predictor (FIRST-K) achieves 61% accuracy 
(268/440). The accuracies of all predictors are consider- 
ably higher on top-scoring subsets of published miRNA 
oligonucleotides. 

[0271] Finally, in order to validate the efficacy and accuracy of 
the Dicer-cut location detector 116, a sample of novel 
oligonucleotides detected thereby is preferably selected, 



and validated by wet lab experiments. Laboratory results 
validating the efficacy of the Dicer-cut location detector 
116 are described hereinbelow with reference to Figs. 
13-15D, Fig. 18 and also in the enclosed file Table 12. 

[0272] Laboratory confirmation of bioinformatically predicted 
oligonucleotides of the viral GAM group of oligonu- 
cleotides, and several bioinformatically predicted viral 
polynucleotides of the GR group of polynucleotides, is de- 
scribed hereinbelow with reference to Figs. 19-20. 

[0273] Reference is now made to Fig. 6C, which is a simplified 
flowchart illustrating an operation of a Dicer-cut location 
detector 116 (Fig. 2), constructed and operative in accor- 
dance with a preferred embodiment of the present inven- 
tion. The Dicer-cut location detector 116 preferably com- 
prises a machine learning computer program module, 
which is trained to recognize Dicer-cut locations on 
known hairpin-shaped miRNA precursors, and based on 
this training, is operable to detect Dicer-cut locations of 
novel GAM RNA (Fig. 1) on GAM FOLDED PRECURSOR RNA 
(Fig. 1). In a preferred embodiment of the present inven- 
tion, the Dicer-cut location module preferably utilizes 
machine learning algorithms, including but not limited to 
Support Vector Machine, Bayesian modeling. Nearest 



Neighbors, and K-nearest-neighbor algorithms that are 
known in the art. 

[0274] When initially assessing a novel GAM FOLDED PRECURSOR 
RNA, each 19-24 nt-long segment thereof is considered 
to be a potential GAM RNA, because the Dicer-cut location 
is initially unknown. 

[0275] For each such potential GAM RNA, the location of its 5' 
end or the locations of its 5' and 3' ends are scored by at 
least one recognition classifier or predictor, operating on 
features such as the follwing: Locations of the 5' and/or 3' 
ends of the known diced mlRNA oligonucleotides, which 
are preferably represented by their respective distances 
from the 5' end of the corresponding hairpin-shaped 
mlRNA precursor. Additionally or alternatively, the 5' and/ 
or 3' ends of the known diced miRNA oligonucleotides, 
which are preferably represented by the relationship be- 
tween their locations and the locations of one or more nu- 
cleotides along the hairpin-shaped mlRNA precursor. Ad- 
ditionally or alternatively, the 5' and/or 3' ends of the 
known diced mlRNA oligonucleotides, which are prefer- 
ably represented by the relationship between their loca- 
tions and the locations of one or more bound nucleotide 
pairs along the hairpin-shaped miRNA precursor. Addi- 



tionally or alternatively, the 5' and/or 3' ends of the 
known diced miRNA oligonucleotides, which are prefer- 
ably represented by the relationship between their loca- 
tions and the locations of one or more mismatched nu- 
cleotide pairs along the hairpin-shaped miRNA precursor. 
Additionally or alternatively, the 5' and/or 3' ends of the 
known diced miRNA oligonucleotides, which are prefer- 
ably represented by the relationship between their loca- 
tions and the locations of one or more unmatched nu- 
cleotides along the hairpin-shaped miRNA precursor. Ad- 
ditionally or alternatively, locations of the 5' and/or 3' 
ends of the known diced miRNA oligonucleotides, which 
are preferably represented by their respective distances 
from the loop located at the center of the corresponding 
hairpin-shaped miRNA precursor. Additionally or alterna- 
tively, metrics related to the nucleotide content of the 
diced miRNA and/or of the hairpin-shaped miRNA precur- 
sor. 

[0276] In a preferred embodiment of the present invention, the 
Dicer-cut location detector 116 (Fig. 2) may use a Support 
Vector Machine predictor. 

[0277] In another preferred embodiment of the present inven- 
tion, the Dicer-cut location detector 116 (Fig. 2) prefer- 



ably employs an "EDIT DISTANCE" predictor, which seeks 
sequences that are similar to those of known miRNA 
oligonucleotides, utilizing a Nearest Neighbor algorithm, 
where a similarity metric between two sequences is a vari- 
ant of the Edit Distance algorithm (Gusfield, 1997). The 
EDIT DISTANCE predictor is based on an observation that 
mlRNA oligonucleotides tend to form clusters, the mem- 
bers of which show marked sequence similarity. 
[0278] In yet another preferred embodiment of the present in- 
vention, the Dicer-cut location detector 116 (Fig. 2) 
preferably uses a "TWO PHASE" predictor, which predicts 
the Dicer-cut location in two distinct phases: (a) selecting 
a double-stranded segment of the GAM FOLDED PRECUR- 
SOR RNA (Fig. 1) comprising the GAM RNA by naive 
Bayesian modeling and (b) detecting which strand of the 
double-stranded segment contains CAM RNA (Fig. 1) by 
employing either naive or K-nearest-neighbor modeling. 
K-nearest-neighbor modeling is a variant of the "FIRST-K" 
predictor described hereinbelow, with parameters opti- 
mized for this specific task. The "TWO PHASE" predictor 
may be operated in two modes: either utilizing only the 
first phase and thereby producing two alternative Dicer- 
cut location predictions, or utilizing both phases and 



thereby producing only one final Dicer-cut location. 

[0279] In still another preferred embodiment of the present in- 
vention, the Dicer-cut location detector 116 preferably 
uses a "FIRST-K" predictor, which utilizes a K-near- 
est-neighbor algorithm. The similarity metric between any 
two sequences is 1- E/L, where L is a parameter, prefer- 
ably 8-10 and E is the edit distance between the two se- 
quences, taking into account only the first L nucleotides 
of each sequence. If the K-nearest-neighbor scores of two 
or more locations on the GAM FOLDED PRECURSOR RNA 
(Fig. 1) are not significantly different, these locations are 
further ranked by a Bayesian model, similar to the one de- 
scribed hereinabove. 

[0280] In accordance with an embodiment of the present inven- 
tion, scores of two or more of the abovementioned classi- 
fiers or predictors are integrated, yielding an integrated 
score for each potential GAM RNA. As an example. Fig. 6C 
illustrates an integration of scores from two classifiers, a 
3' end recognition classifier and a 5' end recognition clas- 
sifier, the scores of which are integrated to yield an inte- 
grated score. Most preferably, the INTEGRATED SCORE of 
Fig. 6C preferably implements a "best-of-breed" approach 
employing a pair of classifiers and accepting only "poten- 



tial GAM RNAs" that score highly on one of the abovemen- 
tioned "EDIT DISTANCE" or "TWO PHASE" predictors. In 
this context, "high scores" means scores that have been 
demonstrated to have low false positive value when scor- 
ing known mlRNA oligonucleotides. Alternatively, the IN- 
TEGRATED SCORE may be derived from operation of more 
or less than two classifiers. 

[0281] The INTEGRATED SCORE is evaluated as follows: (a) the 
"potential GAM RNA" having the highest score is prefer- 
ably taken to be the most probable GAM RNA, and (b) if 
the integrated score of this most probable GAM RNA is 
higher than a pre-defined threshold, then the most prob- 
able GAM RNA is accepted as a PREDICTED GAM RNA. 
Preferably, this evaluation technique is not limited to the 
highest scoring potential GAM RNA. 

[0282] In a preferred embodiment of the present invention, PRE- 
DICTED GAM RNAs comprising a low complexity nu- 
cleotide sequence (e.g., ATATATA) may optionally be fil- 
tered out, because there is a high probability that they are 
part of a repeated element in the DNA, and are therefore 
not functional, as is known in the art. For each PREDICTED 
GAM RNA sequence, the number of occurrences of each 
two nt combination (AA, AT, AC) comprised in that se- 



quence is counted. PREDICTED CAM RNA sequences where 
the sum of the two most probable combinations is higher 
than a threshold, preferably 8-10, are filtered out. As an 
example, when the threshold is set such that 2% of the 
known mlRNA oligonucleotides are filtered out, 30% of the 
predicted CAM RNAs are filtered out. 

[0283] Reference is now made to Fig. 7A, which is a simplified 

block diagram of a preferred implementation of the target 
gene binding site detector 118 described hereinabove 
with reference to Fig. 2. The goal of the target gene bind- 
ing site detector 118 is to detect one or more binding 
sites located in 3'UTRs of the mRNA of a known gene, 
such as BINDINC SITE I, BINDINC SITE II and BINDINC SITE 
III (Fig. 1), the nucleotide sequence of which binding sites 
is partially or fully complementary to a CAM RNA, thereby 
determining that the abovementioned known gene is a 
target gene of the CAM RNA. 

[0284] The target gene binding site detector 118 (Fig. 2) receives 
a plurality of Dicer-cut sequences from hairpin structures 
140 (Fig. 6A) and a plurality of potential target gene se- 
quences 142, which are derived from sequenced DNA data 
104 (Fig. 2). 

[0285] The target gene binding site detector training & validation 



functionality 128 (Fig. 3) is operative to train tlie target 
gene binding site detector 118 on l<nown miRNA oligonu- 
cleotides and their respective target genes and to build a 
background model for an evaluation of the probability of 
achieving similar results randomly (P value) for the target 
gene binding site detector 118 results. The target gene 
binding site detector training & validation functionality 
128 constructs the model by analyzing both heuristically 
and computationally the results of the target gene binding 
site detector 118. 
[0286] Following operation of target gene binding site detector 
training & validation functionality 128 (Fig. 3), the target 
gene binding site detector 118 is operative to detect a 
plurality of potential novel target genes having binding 
site/s 144, the nucleotide sequence of which is partially or 
fully complementary to that of each of the plurality of 
Dicer-cut sequences from hairpin structures 140. Pre- 
ferred operation of the target gene binding site detector 
118 is further described hereinbelow with reference to 
Fig. 7B. 

[0287] Reference is now made to Fig. 7B, which is a simplified 
flowchart illustrating a preferred operation of the target 
gene binding site detector 118 of Fig. 2. 



[0288] In an embodiment of the present invention, tlie target 

gene binding site detector 118 first compares nucleotide 
sequences of each of the plurality of Dicer-cut sequences 
from hairpin structures 140 (Fig. 6A) to the potential tar- 
get gene sequences 142 (Fig. 7A), such as 3' side UTRs of 
known mRNAs, in order to find crude potential matches. 
This step may be performed using a simple alignment al- 
gorithm such as BLAST. 

[0289] Then, the target gene binding site detector 118 filters 
these crude potential matches, to find closer matches, 
which more closely resemble published mlRNA oligonu- 
cleotide binding sites. 

[0290] Next, the target gene binding site detector 118 expands 
the nucleotide sequences of the 3'UTR binding site found 
by the sequence comparison algorithm (e.g. BLAST or EDIT 
DISTANCE). A determination is made whether any sub- 
sequence of the expanded sequence may improve the 
match. The best match is considered the alignment. 

[0291] Free-energy and spatial structure are computed for the 
resulting binding sites. Calculation of spatial structure 
may be performed by a secondary structure folding algo- 
rithm based on free-energy minimization, such as the 
IVIFOLD algorithm described in Mathews et al. 0- Mol. Biol. 



288: 911-940 (1999)) and Zuker (Nucleic Acids Res. 31: 
3406-3415 (2003)), tlie disclosure of which is hereby in- 
corporated by reference. Free-energy, spatial structure 
and the above preferences are reflected in scoring. The 
resulting scores are compared with scores characterstic of 
known binding sites of published mlRNA oligonucleotides, 
and each binding site is given a score that reflects its re- 
semblance to these known binding sites. 
[0292] Finally, the target gene binding site detector 118 analyzes 
the spatial structure of the binding site. Each 3'UTR-GAM 
oligonucleotide pair is given a score. Multiple binding 
sites of the same GAM oligonucleotides to a 3'UTR are 
given higher scores than those that bind only once to a 
3'UTR. 

[0293] In a preferred embodiment of the present invention, per- 
formance of the target gene binding site detector 118 
may be improved by integrating several of the abovemen- 
tioned logical steps, using the methodology described 
hereinbelow. 

[0294] For each of the Dicer-cut sequence from hairpin struc- 
tures 140, its starting segment, e.g. a segment compris- 
ing the first 8 nts from its 5' end, is obtained. For each 
starting segment, all of the 9 nt segments that are highly 



complementary to the starting segment are calculated. 
These calculated segments are referred to here as "poten- 
tial binding site end segments". In a preferred embodi- 
ment of the present invention, for each 8 nt starting seg- 
ment, the potential binding site end segments are all 9 nt 
segments whose complementary sequence contains a 7-9 
nt sub-sequence that is not different from the starting 
segment by more than an insertion, deletion or replace- 
ment of one nt. Calculation of potential binding site end 
segments is preferably performed by a pre-processing 
tool that maps all possible 8 nt segments to their respec- 
tive 9 nt segments. 
[0295] Next, the mRNAs 3'UTRs is parsed into all the segments, 
with the same length as the potential binding site end 
segments, preferably 9 nt segments, comprised in the 
3'UTR. Location of each such segment is noted, stored in a 
performance-efficient data structure and compared to the 
potential binding site end segments calculated in the pre- 
vious step. 

[0296] The target gene binding site detector 118 then expands 
the binding site sequence, preferably in the binding site 5' 
direction (i.e. immediately upstream), assessing the de- 
gree of its alignment to the Dicer-cut sequence from hair- 



pin structures 140. Preferably, an alignment algorithm is 
implemented which uses specific weighting parameters 
based on an analysis of known miRNA oligonucleotide 
binding sites. As an example, it is apparent that a good 
match of the 3' end of the binding site is critically impor- 
tant, a match of the 5' end is less important but can com- 
pensate for a small number of mismatches at the 3' end of 
the binding site, and a match of the middle portion of the 
binding site is much less important. 
[0297] Next, the number of binding sites found in a specific 

3'UTR, the degree of alignment of each of these binding 
sites, and their proximity to each other are assessed and 
compared to these properties found in known binding 
sites of published mlRNA oligonucleotides. In a preferred 
embodiment, the fact that many of the known binding 
sites are clustered is used to evaluate the Rvalue of ob- 
taining a cluster of a few binding sites on the same target 
gene 3'UTR in the following way. It scans different score 
thresholds and calculates for each threshold the number 
and positions of possible binding sites with a score above 
the threshold. It then gets a P value for each threshold 
from a preprocessed calculated background matrix, de- 
scribed hereinbelow, and a number and positions of bind- 



ing sites combination. The output score for eacli Dicer- 
cut sequences from liairpin structures 140 and potential 
target gene sequences 142 is the minimal Rvalue, nor- 
malized with the number of threshold trails using a 
Bernoulli distribution. A preference of low P value pairs is 
made. 

[0298] As mentioned hereinabove, for each target gene, a pre- 
processed calculated background matrix is built. The ma- 
trix includes rows for each number of miRNA oligonu- 
cleotide binding sites (in the preferred embodiment, the 
matrix includes 7 rows to accommodate 0 to 6 binding 
sites), and columns for each different score threshold (in 
the preferred embodiment, the matrix includes 5 columns 
for 5 different thresholds). Each matrix cell, correspond- 
ing to a specific number of binding sites and thresholds, 
is set to be the probability of getting equal or higher 
number binding sites and an equal or higher score using 
random 22 nt-long sequences with the same nucleotide 
distribution as known miRNA oligonucleotides (29.5% T, 
24.5% A, 25% G and 21% C). Those probabilities are calcu- 
lated by running the above procedure for 10000 random 
sequences that preserved the known miRNA nucleotide 
distribution (these sequence will be also referred to as 



miRNA oligonucleotide random sequences). The Rvalue 
can be estimated as the number of random sequences 
that obeys the matrix cell requirement divided by the total 
number of random sequences (10000). In the preferred 
embodiment, 2 matrices are calculated. The Rvalues of 
the second matrix are calculated under a constraint that at 
least two of the binding site positions are under a heuris- 
tically-determined constant value. The values of the sec- 
ond matrix are calculated without this constraint. The tar- 
get gene binding site detector 118 uses the second matrix 
if the binding site positions agree with the constraint. 
Otherwise, it uses the first. In an alternative embodiment, 
only one matrix is calculated without any constraint on 
the binding sites positions. 
[0299] A test performed using the target gene binding site de- 
tector 118 shows that all of the known miRNA oligonu- 
cleotide target genes are found using this algorithm with 
a R value of less than 0.5%. Running known miRNA 
oligonucleotides against 3400 potential 3'UTR of target 
gene sequences yields on average 32 target genes for 
each miRNA oligonucleotide with a R value less than 0.5%, 
while background sequences, as well as inverse or com- 
plement sequence of known miRNA oligonucleotide (which 



preserve their high order sequence statistics) found, as 
expected, 17 target genes on average. This result reflects 
that the algorithm has the ability to detect real target 
genes with 47% accuracy. 

[0300] Finally, orthology data may optionally be used to further 
prefer binding sites based on their conservation. Prefer- 
ably, this may be used in cases such as (a) where both the 
target mRNA and mlRNA oligonucleotide have orthologues 
in another organism, e.g. Human-Mouse orthology, or (b) 
where a miRNA oligonucleotide (e.g. viral miRNA oligonu- 
cleotide) targets two mRNAs in orthologous organisms. In 
such cases, binding sites that are conserved are preferred. 

[0301] In accordance with another preferred embodiment of the 
present invention, binding sites may be searched by a re- 
verse process. Sequences of K (preferably 22) nucleotides 
in a UTR of a target gene are assessed as potential bind- 
ing sites. A sequence comparison algorithm, such as 
BLAST or EDIT DISTANCE variant, is then used to search 
elsewhere in the genome for partially or fully complemen- 
tary sequences that are found in known miRNA oligonu- 
cleotides or computationally-predicted GAM oligonu- 
cleotides. Only complementary sequences that meet pre- 
determined spatial structure and free-energy criteria as 



described hereinabove, are accepted. Clustered binding 
sites are strongly preferred and potential binding sites 
and potential GAM oligonucleotides that occur in evolu- 
tionarily-conserved genomic sequences are also pre- 
ferred. Scoring of candidate binding sites takes into ac- 
count free-energy and spatial structure of the binding site 
complexes, as well as the aforesaid preferences. 
[0302] uTRs of CAM viral target genes were preferably extracted 
directly from annotation of UTR records. Alternatively, 
UTR of CAM viral target genes were preferably extracted 
by taking the sequences spanned from last coding posi- 
tion to the 3' end of the mRNA sequence annotation. Al- 
ternatively, UTR of CAM viral target genes were preferably 
extracted by taking 400 nts downstream to the end- 
coding region of the mRNA sequence. All of abovemen- 
tioned methods were applied on complete viral genomes 
data in CeneBank format from the NCBI RefSeq database, 
version 18-Jan-2004 

(ftp://ftp.ncbi.nih.gov/refseq/release/viral). 
[0303] Reference is now made to Fig. 8, which is a simplified 

flowchart illustrating a preferred operation of the function 
& utility analyzer 120 described hereinabove with refer- 
ence to Fig. 2. The goal of the function & utility analyzer 



120 is to determine if a potential target gene is in fact a 
valid clinically useful target gene. Since a potential novel 
GAM oligonucleotide binding a binding site in the UTR of 
a target gene is understood to inhibit expression of that 
target gene, and if that target gene is shown to have a 
valid clinical utility, then in such a case it follows that the 
potential novel oligonucleotide itself also has a valid use- 
ful function which is the opposite of that of the target 
gene. 

[0304] The function & utility analyzer 120 preferably receives as 
input a plurality of potential novel target genes having 
binding site/s 144 (Fig. 7A), generated by the target gene 
binding site detector 118 (Fig. 2). Each potential oligonu- 
cleotide is evaluated as follows: First, the system checks 
to see if the function of the potential target gene is scien- 
tifically well established. Preferably, this can be achieved 
bioinformatically by searching various published data 
sources presenting information on known function of pro- 
teins. Many such data sources exist and are published, as 
is well known in the art. Next, for those target genes the 
function of which is scientifically known and is well docu- 
mented, the system then checks if scientific research data 
exists which links them to known diseases. For example, a 



preferred embodiment of the present invention utilizes 
tlie OIVIII\/l(TIVI) (Hamosli et al, 2002) database publislied by 
NCBI, which summarizes research publications relating to 
genes which have been shown to be associated with dis- 
eases. Finally, the specific possible utility of the target 
gene is evaluated. While this process too may be facili- 
tated by bioinformatic means, it might require manual 
evaluation of published scientific research regarding the 
target gene, in order to determine the utility of the target 
gene to the diagnosis and or treatment of specific disease. 
Only potential novel oligonucleotides, the target genes of 
which have passed all three examinations, are accepted as 
novel oligonucleotide. 

[0305] Reference is now made to Fig. 9, which is a simplified dia- 
gram describing each of a plurality of novel bioinformati- 
cally-detected regulatory polynucleotide referred to in this 
Table as the Genomic Record (GR) polynucleotide. GR en- 
codes an operon-like cluster of novel mlRNA-like 
oligonucleotides, each of which in turn modulates expres- 
sion of at least one target gene. The function and utility of 
at least one target gene is known in the art. 

[0306] The GR PRECURSOR is a novel, bioinformatically-detected, 
regulatory, non-protein-coding polynucleotide. The 



method by which the GR PRECURSOR is detected is de- 
scribed hereinabove with additional reference to Figs. 1-9. 

[0307] GR PRECURSOR is preferably encoded by a viral genome 
and contains a cluster of novel viral oligonucleotides, 
which preferably bind to human target genes or to virus 
genes. Alternatively or additionally, GR PRECURSOR is en- 
coded by the human genome and contains a cluster of 
novel human oligonucleotides, which preferably bind to 
viral target genes or to human genes. 

[0308] The GR PRECURSOR encodes GR PRECURSOR RNA that is 
typically several hundred to several thousand nts long. 
The GR PRECURSOR RNA folds spatially, forming the GR 
FOLDED PRECURSOR RNA. It is appreciated that the GR 
FOLDED PRECURSOR RNA comprises a plurality of what is 
l<nown in the art as hairpin structures. Hairpin structures 
result from the presence of segments of the nucleotide 
sequence of GR PRECURSOR RNA in which the first half of 
each such segment has a nucleotide sequence which is at 
least a partial, and sometimes an accurate, reverse- 
complement sequence of the second half thereof, as is 
well known in the art. 

[0309] The GR FOLDED PRECURSOR RNA is naturally processed by 
cellular enzymatic activity into a plurality of separate GAM 



precursor RNAs, herein schematically represented by 
GAMl FOLDED PRECURSOR RNA through GAMS FOLDED 
PRECURSOR RNA. Each GAM folded precursor RNA is a 
hairpin-shaped RNA segment, corresponding to GAM 
FOLDED PRECURSOR RNA of Fig. 1. 

[0310] The abovementioned GAM folded precursor RNAs are 

diced by DICER COMPLEX of Fig. 1, yielding schematically 
represented by GAMl RNA through GAM3 RNA, short RNA 
segments of about 22 nts in length. Each GAM RNA corre- 
sponds to GAM RNA of Fig. 1. 

[0311] GAMl RNA, GAM2 RNA and GAM3 RNA each bind comple- 
mentarily to binding sites located in the untranslated re- 
gions of their respective target genes, designated GAMl 
TARGET RNA, GAM2 TARGET RNA and GAM3 TARGET RNA, 
respectively. These target binding sites correspond to 
BINDING SITE I, BINDING SITE II and BINDING SITE III of Fig. 
1. The binding of each GAM RNA to its target RNA inhibits 
the translation of its respective target proteins, desig- 
nated GAMl TARGET PROTEIN, GAM2 TARGET PROTEIN 
and GAM3 TARGET PROTEIN, respectively. 

[0312] It is appreciated that the specific functions, and accord- 
ingly the utilities, of the GR polynucleotide are correlated 
with and may be deduced from the identity of the target 



genes that are inhibited by GAM RNAs that are present in 
the operon-lil<e cluster of the polynucleotide. Thus, for 
the GR polynucleotide, schematically represented by 
GAMl TARGET PROTEIN through GAM3 TARGET PROTEIN 
that are inhibited by the GAM RNA. The function of these 
target genes is elaborated in Table 8, hereby incorporated 
herein. 

[0313] Reference is now made to Fig. 10, which is a block dia- 
gram illustrating different utilities of oligonucleotide of 
the novel group of oligonucleotides of the present inven- 
tion referred to here as GAM oligonucleotides and GR 
polynucleotides. The present invention discloses a first 
plurality of novel oligonucleotides referred to here as GAM 
oligonucleotides and a second plurality of operon-like 
polynucleotides referred to here as GR polynucleotides, 
each of the GR polynucleotide encoding a plurality of GAM 
oligonucleotides. The present invention further discloses a 
very large number of known target genes, which are 
bound by, and the expression of which is modulated by 
each of the novel oligonucleotides of the present inven- 
tion. Published scientific data referenced by the present 
invention provides specific, substantial, and credible evi- 
dence that the abovementioned target genes modulated 



by novel oligonucleotides of the present invention, are as- 
sociated with various diseases. Specific novel oligonu- 
cleotides of the present invention, target genes thereof 
and diseases associated therewith, are described herein- 
below with reference to Tables 1 through 12. It Is there- 
fore appreciated that a function of GAM oligonucleotides 
and GR polynucleotides of the present Invention Is modu- 
lation of expression of target genes related to known viral 
diseases, and that therefore utilities of novel oligonu- 
cleotides of the present Invention Include diagnosis and 
treatment of the abovementloned diseases. 
[0314] Fig. 10 describes various types of diagnostic and thera- 
peutic utilities of novel oligonucleotides of the present In- 
vention. A utility of novel oligonucleotide of the present 
invention is detection of GAM oligonucleotides and of GR 
polynucleotides. It is appreciated that since GAM oligonu- 
cleotides and GR polynucleotides modulate expression of 
disease related target genes, that detection of expression 
of GAM oligonucleotides In clinical scenarios associated 
with said viral diseases Is a specific, substantial and credi- 
ble utility. Diagnosis of novel oligonucleotides of the 
present invention may preferably be implemented by RNA 
expression detection techniques. Including but not limited 



to biochips, as is well known in the art. Diagnosis of ex- 
pression of oligonucleotides of the present invention may 
be useful for research purposes, in order to further un- 
derstand the connection between the novel oligonu- 
cleotides of the present invention and the abovemen- 
tioned related viral diseases, for disease diagnosis and 
prevention purposes, and for monitoring disease 
progress. 

[0315] Another utility of novel oligonucleotides of the present in- 
vention is anti-GAM therapy, a mode of therapy which al- 
lows up regulation of a viral disease-related target gene 
of a novel GAM oligonucleotide of the present invention, 
by lowering levels of the novel GAM oligonucleotide which 
naturally inhibits expression of that target gene. This 
mode of therapy is particularly useful with respect to tar- 
get genes which have been shown to be under-expressed 
in association with a specific viral disease. Anti-GAM ther- 
apy is further discussed hereinbelow with reference to 
Figs. llAand IIB. 

[0316] A further utility of novel oligonucleotides of the present 
invention is GAM replacement therapy, a mode of therapy 
which achieves down regulation of a viral disease related 
target gene of a novel GAM oligonucleotide of the present 



invention, by raising levels of the CAM which naturally in- 
hibits expression of that target gene. This mode of ther- 
apy is particularly useful with respect to target genes 
which have been shown to be over-expressed in associa- 
tion with a specific viral disease. GAM replacement ther- 
apy involves introduction of supplementary GAM products 
into a cell, or stimulation of a cell to produce excess GAM 
products. GAM replacement therapy may preferably be 
achieved by transfecting cells with an artificial DNA 
molecule encoding a GAM which causes the cells to pro- 
duce the GAM product, as is well known in the art. 
[0317] Yet a further utility of novel oligonucleotides of the 

present invention is modified GAM therapy. Disease con- 
ditions are likely to exist, in which a mutation in a binding 
site of a GAM RNA prevents natural GAM RNA to effec- 
tively bind inhibit a viral disease related target gene, 
causing up regulation of that target gene, and thereby 
contributing to the disease pathology. In such conditions, 
a modified GAM oligonucleotides is designed which effec- 
tively binds the mutated GAM binding site, i.e. is an effec- 
tive anti-sense of the mutated GAM binding site, and is 
introduced in disease effected cells. Modified GAM ther- 
apy is preferably achieved by transfecting cells with an ar- 



tificial DNA molecule encoding the modified GAM which 
causes the cells to produce the modified GAM product, as 
is well known in the art. 

[0318] Reference is now made to Figs. IIA and IIB, which are 
simplified diagrams which when taken together illustrate 
anti-GAM therapy mentioned hereinabove with reference 
to Fig. 10. A utility of novel GAMs of the present invention 
is anti-GAM therapy, a mode of therapy which allows up 
regulation of a viral disease-related target gene of a novel 
GAM of the present invention, by lowering levels of the 
novel GAM which naturally inhibits expression of that tar- 
get gene. Fig. IIA shows a normal GAM inhibiting trans- 
lation of a target gene by binding of GAM RNA to a BIND- 
ING SITE found in an untranslated region of GAM TARGET 
RNA, as described hereinabove with reference to Fig. 1. 

[0319] Fig. IIB shows an example of anti-GAM therapy. ANTI- 
GAM RNA is short artificial RNA molecule the sequence of 
which is an anti-sense of GAM RNA. Anti-GAM treatment 
comprises transfecting diseased cells with ANTI-GAM 
RNA, or with a DNA encoding thereof. The ANTI-GAM RNA 
binds the natural GAM RNA, thereby preventing binding of 
natural GAM RNA to its BINDING SITE. This prevents natu- 
ral translation inhibition of GAM TARGET RNA by GAM 



RNA, thereby up regulating expression of CAM TARGET 
PROTEIN. 

[0320] It is appreciated that anti-GAM therapy is particularly use- 
ful with respect to target genes which have been shown to 
be under-expressed in association with a specific viral 
disease. 

[0321] Furthermore, anti-GAM therapy is particularly useful, 

since it may be used in situations in which technologies 
known in the art as RNAi and siRNA can not be utilized. As 
in known in the art, RNAi and siRNA are technologies 
which offer means for artificially inhibiting expression of a 
target protein, by artificially designed short RNA segments 
which bind complementarily to mRNA of said target pro- 
tein. However, RNAi and siRNA can not be used to directly 
up regulate translation of target proteins. 

[0322] Reference is now made to Fig. 12A, which is a bar graph 
illustrating performance results of the hairpin detector 
114 (Fig. 2) constructed and operative in accordance with 
a preferred embodiment of the present invention. 

[0323] Fig. 12 A illustrates efficacy of several features used by the 
hairpin detector 114 to detect GAM FOLDED PRECURSOR 
RNAs (Fig. 1). The values of each of these features is com- 
pared between a set of published miRNA precursor 



oligonucleotides, represented by shaded bars, and a set of 
random hairpins folded from the human genome denoted 
hereinbelow as a hairpin background set, represented by 
white bars. The published mlRNA precursor oligonu- 
cleotides set is taken from RFAM database. Release 2.1 
and includes 148 mlRNA oligonucleotides from H. Sapiens. 
The background set comprises a set of 10,000 hairpins 
folded from the human genome. 
[0324] It is appreciated that the hairpin background set is ex- 
pected to comprise some valid, previously undetected 
hairpin-shaped mlRNA precursor-like GAM FOLDED PRE- 
CURSOR RNAs of the present invention, and many hairpin- 
shaped sequences that are not hairpin-shaped mlRNA- 
like precursors. 

[0325] For each feature, the bars depict the percent of known 
miRNA hairpin precursors (shaded bars) and the percent 
of background hairpins (white bars) that pass the thresh- 
old for that feature. The percent of known miRNA 
oligonucleotides that pass the threshold indicates the 
sensitivity of the feature, while the corresponding back- 
ground percent implies the specificity of the feature, al- 
though not precisely, because the background set com- 
prises both true and false examples. 



[0326] The first bar pair, labeled Thermodynamic Stability Selec- 
tion, depicts hairpins that have passed the selection of 
"families" of closely related hairpin structures, as de- 
scribed hereinabove with reference to Fig. 5B. 

[0327] The second bar pair, labeled Hairpin Score, depicts hair- 
pins that have been selected by hairpin detector 114 (Fig. 
5B), regardless of the "families" selection. 

[0328] The third bar pair, labeled Conserved, depicts hairpins 

that are conserved in human, mouse and rat, (UCSC Gold- 
enpath (TM) HG16 database). 

[0329] The fourth bar pair, labeled Expressed, depicts hairpins 
that are found in EST blocks. 

[0330] The fifth bar pair, labeled Integrated Selection, depicts 

hairpin structures predicted by a preferred embodiment of 
the present invention to be valid GAM PRECURSORS. In a 
preferred embodiment of the present invention, a hairpin 
may be considered to be a GAM PRECURSOR if its hairpin 
detector score is above 0, and it is in one of the following 
groups: a) in an intron and conserved or b) in an inter- 
genic region and conserved or c) in an intergenic region 
and expressed, as described below. Further filtering of 
GAM precursor may be obtained by selecting hairpins with 
a high score of Dicer-cut location detector 116 as de- 



scribed hereinabove with reference to Figs. 6A-6C, and 
with predicted miRNA oligonucleotides, which pass the 
low complexity filter as described hereinabove, and whose 
targets are selected by the target gene binding site detec- 
tor 118 as described hereinabove with reference to Figs. 
7A-7B. 

[0331] It is appreciated that these results validate the sensitivity 
and specificity of the hairpin detector 114 (Fig. 2) in iden- 
tifying novel GAM FOLDED PRECURSOR RNAs, and in ef- 
fectively distinguishing them from the abundant hairpins 
found in the genome. 

[0332] Reference is now made to Fig. 12B, which is a line graph 
illustrating accuracy of a Dicer-cut location detector 116 
(Fig. 2) constructed and operative in accordance with a 
preferred embodiment of the present invention. 

[0333] To determine the accuracy of the Dicer-cut location de- 
tector 116, a stringent training and test set was chosen 
from the abovementioned set of 440 known mlRNA 
oligonucleotides, such that no two mlRNA oligonu- 
cleotides in the set are homologous. This was performed 
to get a lower bound on the accuracy and avoid effects of 
similar known miRNA oligonucleotides appearing in both 
the training and test sets. On this stringent set of size 



204, mfold cross validation with l<=3 was performed to 
determine tlie percent of l<nown miRNA oligonucleotides 
in which the Dicer-cut location detector 116 described 
hereinabove predicted the correct miRNA oligonucleotide 
up to two nucleotides from the correct location. The accu- 
racy of the TWO PHASED predictor is depicted in the 
graph. The accuracy of the first phase of the TWO PHASED 
predictor is depicted by the upper line, and that of both 
phases of the TWO PHASED predictor is depicted by the 
lower line. Both are binned by the predictor score, where 
the score is the score of the first stage. 

[0334] It js appreciated that these results validate the accuracy of 
the Dicer-cut location detector 116. 

[0335] Reference is now made to Fig. 12C, which is a bar graph 
illustrating the performance results of the target gene 
binding site detector 118 (Fig. 7A) constructed and opera- 
tive in accordance with a preferred embodiment of the 
present invention. 

[0336] Fig. 12C illustrates specificity and sensitivity of the target 
gene binding site detector 118. The values presented are 
the result of testing 10000 artificial miRNA oligonu- 
cleotide sequences (random 22 nt sequences with the 
same base composition as published miRNA oligonu- 



cleotide sequence). Adjusting the threshold parameters to 
fulfill 90% sensitivity of validated, published miRNA-3'UTR 
pairs, requires the P VAL of potential target gene se- 
quences-Dicer-cut sequences to be less than 0.01 and 
also the P VAL of potential target ortholog gene se- 
quences-Dicer-cut sequences to be less than 0.05. The 
target gene binding site detector 118 can filter out 99.7% 
of potential miRNA/gene pairs, leaving only the 0.3% that 
contain the most promising potential miRNA/gene pairs. 
Limiting the condition for the P VAL of potential target or- 
tholog gene sequences-Dicer-cut sequences to be less 
than 0.01 reduces the sensitivity ratio to 70% but filters 
out more then 50% of the remaining 0.3%, to a final ratio 
of less than 0.15%. 
[0337] It is appreciated that these results validate the sensitivity 
and specificity of the target gene binding site detector 
118. 

[0338] Reference is now made to Fig. 13, which is a summary ta- 
ble of laboratory results validating the expression of 29 
novel human CAM RNA oligonucleotides in HeLa cells or, 
alternatively, in liver or thymus tissues detected by the 
bioinformatic oligonucleotide detection engine 100 (Fig. 
2). 



[0339] As a positive control, we used a reference set of eight 

known liuman miRNA oligonucleotides: hsa-MIR-21; hsa- 
MIR-27b; hsa-MIR-186; hsa-MIR-93; hsa-MIR-26a; hsa- 
MIR-191; hsa-MIR-31; and hsa-MIR-92. All positive con- 
trols were successfully validated by sequencing. 

[0340] The table of Fig. 13 lists all GAM RNA predictions whose 
expression was validated. The field "Primer Sequence" 
contains the "specific" part of the primer; the field "Se- 
quenced sequence" represents the nucleotide sequence 
detected by cloning (excluding the hemispecific primer 
sequence); the field "Predicted GAM RNA" contains the 
GAM RNA predicted sequence; the field "Distance" indi- 
cates the distance from the Primer; the number of mis- 
matches between the "specific" region of the primer and 
the corresponding part of the GAM RNA sequence; the 
field "GAM Name" contains GAM RNA PRECURSOR ID fol- 
lowed by "A" or "B", which represents the GAM RNA posi- 
tion on the precursor as elaborated in the attached Ta- 
bles. 

[0341] A primer was designed such that its first half, the 5' re- 
gion, is complementary to the adaptor sequence and its 
second half, the 3' region, anneals to the 5' terminus of 
GAM RNA sequence, yielding a hemispecific primer (as 



elaborated hereinbelow in the Methods section). A sample 
of 13 predicted GAIVI RNA sequences was examined by 
PGR using hemispecific primers and a primer specific to 
the 3' adaptor. PGR products were cloned into plasmid 
vectors and then sequenced. For all 13 predicted GAM 
RNA sequences, the GAM RNA sequence found in the 
hemispecific primer plus the sequence observed between 
the hemispecific primer and the 3' adaptor was completely 
included in the expected GAM RNA sequence (rows 1-7, 
and 29). The rest are GAM RNA predictions that were veri- 
fied by cloning and sequencing, yet, by using a primer 
that was originally designed for a slightly different predic- 
tion. 

[0342] It js appreciated that failure to detect a predicted oligonu- 
cleotide in the lab does not necessarily indicate a mis- 
taken bioinformatic prediction. Rather, it may be due to 
technical sensitivity limitation of the lab test, or because 
the predicted oligonucleotides are not expressed in the 
tissue examined, or at the development phase tested. The 
observed GAM RNAs may be strongly expressed in HeLa 
cells while the original GAM RNAs are expressed at low 
levels in HeLa cells or not expressed at all. Under such 
circumstances, primer sequences containing up to three 



mismatches from a specific GAM RNA sequence may am- 
plify it. Tlius, we also considered cases in which differ- 
ences of up to 3 mismatches in the hemispecific primer 
occur. 

[0343] The 3' terminus of observed CAM RNA sequences is often 
truncated or extended by one or two nucleotides. Cloned 
sequences that were sequenced from both 5' and 3' ter- 
mini have an asterick appended to the row number. 

[0344] Interestingly, the primer sequence followed by the ob- 
served cloned sequence is contained within five GAM RNA 
sequences of different lengths, and belong to 24 precur- 
sors derived from distinct loci (Row 29). Out of these, one 
precursor appears four times in the genome and its corre- 
sponding GAM Names are 351973-A, 352169-A, 
352445-A and 358164-A. 

[0345] The sequence presented in Row 29 is a representative of 
the group of five GAM RNAs. The full list of GAM RNA se- 
quences and their corresponding precursors is as follows 
(each GAM RNA sequence is followed by the GAM Name): 
TCACTGCAACCTCCACCTCCCA (352092, 
352651, 35576 1),TCACTGCAACCTCCACCTCCCG (3 5 1868, 
352440, 351973, 352169, 352445, 358164, 353737, 
352382, 352235, 352232, 352268, 351919, 352473, 



352444, 353638, 353004, 352925, 352943), TCACTG- 
CAACCTCCACCTCCTG 

(358311),TCACTCCAACCTCCACCTTCAG (353323), and 
TCACTCCAACCTCCACCTTCCC (353856). 
[0346] METHOD SECTION 

[0347] CELL LINES 

[0348] Three common human cell lines, obtained from Dr. Yonat 
Shemer at Soroka Medical Center, Be'er Sheva, Israel, were 
used for RNA extraction; Human Embryonic Kidney HEK- 
293 cells, Human Cervix Adenocarcinoma HeLa cells and 
Human Prostate Carcinoma PC3cells. 

[0349] RNA PURIFICATION 

[0350] Several sources of RNA were used to prepare libraries: 

[0351] Total HeLa SlOO RNA was prepared from HeLa SlOO cellu- 
lar fraction (4C Biotech, Belgium) through an SDS 
(l%)-Proteinase K (200g/ml) 30 minute incubation at 37 C 
followed by an acid Phenol-Chloroform purification and 
isopropanol precipitation (Sambrook et al; Molecular 
Cloning- A Laboratory Manual). 

[0352] Total HeLa, HEK-293 and PC3 cell RNA was prepared us- 
ing the standard Tri-Reagent protocol (Sigma) according 
to the manufacturer's instructions, except that 1 volume 



of isopropanol was substituted with 3 volumes of etiianol. 

[0353] Nuclear and Cytoplasmic RNA was prepared from HeLa or 
HEK-293 cells in the following manner: 

[0354] Cell were washed and harvested in ice-cold PBS and pre- 
cipitated in a swing-out rotor at 1200 rpm at 4 C for 5 
minutes. Pellets were loosened by gentle vortexing. 4ml of 
"NP40 lysis buffer" (lOmM TrisHCI, 5mM MgCI2, lOmM 
NaCI, 0.5% Nonidet P40 , ImM Spermidine, ImM DTT, 
140U/ml rRnasine ) was then added per 5*107 cells. Cells 
and lysis buffer were incubated for 5 minutes on ice and 
centrifuged in a swing-out rotor at 500xg at 4 C for 5 
minutes. Supernatant, termed cytoplasm, is carefully re- 
moved to a tube containing SDS (1% final) and proteinase- 
K (200 g/ml final). Pellet, termed nuclear fraction, is re- 
washed and incubated with a similar amount of fresh lysis 
buffer. Lysis is monitored visually under a microscope at 
this stage, typically for 5 minutes. Nuclei are pelleted in a 
swing-out rotor at 500xg at 4 C for 5 minutes. Super- 
natant is pooled, incubated at 37 C for 30 minutes, Phe- 
nol/Chloroform-extracted, and RNA is alcohol-pre- 
cipitated (Sambrook et al). Nuclei are loosened and then 
homogenized immediately in >10 volumes of Tri-Reagent 
(Sigma). Nuclear RNA is then prepared according to the 



manufacturer's instructions. 
[0355] TOTAL TISSUE RN A 

[0356] Total tissue RNA was obtained from Ambion USA, and in- 
cluded Human Liver, Thymus, Placenta, Testes and Brain. 
[0357] RNA SIZE FRACTIONATION 

[0358] RNA used for libraries was always size-fractionated. Frac- 
tionation was done by loading up to 500 microgram RNA 
per YMIOO Amicon Microcon column (Millipore) followed 
by a SOOxg centrifugation for 40 minutes at 4 C. Flow- 
through "YMIOO" RNA is about one quarter of the total 
RNA and was used for library preparation or fractionated 
further by loading onto a YM30 Amicon Microcon column 
(Millipore) followed by a 13,500xg centrifugation for 25 
minutes at 4 C. Flow-through "YM30" was used for library 
preparation "as is" and consists of less than 0.5% of total 
RNA. Additional size fractionation was achieved during li- 
brary preparation. 

[0359] LIBRARY PREPARATION 

[0360] Two types of cDNA libraries, designated "One-tailed" and 
"Ligation", were prepared from the one of the abovemen- 
tioned fractionated RNA samples. RNA was dephosphory- 
lated and ligated to an RNA (designated with lowercase 



Ietters)-DNA (designated with UPPERCASE letters) hybrid 
5'-phosphorylated, 3' idT blocked 3'-adapter 
(5'-P-uuuAACCGCATCCTTCTC-idT-3' Dharmacon # P- 

002045- 01-05) (as elaborated in Elbashir et al., Genes 
Dev. 15:188-200 (2001)) resulting in ligation only of 
RNase III type cleavage products. 3'-Ligated RNA was ex- 
cised and purified from a half 6%, half 13% polyacrylamide 
gel to remove excess adapter with a Nanosep 0.2 microM 
centrifugal device (Pall) according to instructions, and 
precipitated with glycogen and 3 volumes of ethanol. Pel- 
let was resuspended in a minimal volume of water. 

[0361] For the "Ligation" library, a DNA (UPPERCASE)- RNA 
(lowercase) hybrid 5'-adapter 
(5'-TACTAATACGACTCACTaaa-3' Dharmacon # P- 

002046- 01-05) was ligated to the 3'-adapted RNA, re- 
verse transcribed with "EcoRI-RT": 

(5'-GACTACCTGGAATTCAAGGATGCGGTTAAA-3'), PCR- 
amplified with two external primers essentially as in El- 
bashir et al. (2001), except that primers were "EcoRI-RT" 
and "PstI 

Fwd"(5'-CACCCAACCCTCCACATACGACTCACTAAA-3'). 

This PCR product was used as a template for a second 
round of PCR with one hemispecific and one external 



primer or with two hemispecific primers. 
[0362] For tlie "One-tailed" library, the 3'-adapted RNA was an- 
nealed to 20pmol primer "EcoRI RT" by heating to 70 C 
and cooling 0.1 C/sec to 30 C and then reverse-tran- 
scribed with Superscript II RT (according to manufacturer's 
instructions, Invitrogen) in a 20 microliters volume for 10 
alternating 5 minute cycles of 37 C and 45 C. Subse- 
quently, RNA was digested with 1 microliter 2M NaOH and 
2mM EDTA at 65 C for 10 minutes. cDNA was loaded on a 
polyacrylamide gel, excised and gel-purified from excess 
primer as above (invisible, judged by primer run along- 
side) and resuspended in 13 microliters of water. Purified 
cDNA was then oligo-dC tailed with 400U of recombinant 
terminal transferase (Roche Molecular Biochemicals), 1 
microliter 100 microM dCTP, 1 microliter 15mM CoCI2, 
and 4 microliters reaction buffer, to a final volume of 20 
microliters for 15 minutes at 37 C. Reaction was stopped 
with 2 microliters 0.2M EDTA and 15 microliters 3M 
NaOAc pH 5.2. Volume was adjusted to 150 microliters 
with water. Phenol: Bromochloropropane 10:1 extracted 
and subsequently precipitated with glycogen and 3 vol- 
umes of ethanol. C-tailed cDNA was used as a template 
for PGR with the external primers 



"T3-PstBsg(G/l)18"(5'-AATTAACCCTCACTAAAGGCTGCAG 
GTGCAGGIGGGIIGGGIIGGGIIGN-3' where I stands for Ino- 
sine and N for any of the 4 possible deoxynucleotides), 
and with "EcoRI 

Nested"(5'-GGAATTCAAGGATGCGGTTA-3'). This PGR 
product was used as a template for a second round of PGR 
with one hemispecific and one external primer or with two 
hemispecific primers. 
[0363] PRIMER DESIGN AND PGR 

[0364] Hemispecific primers were constructed for each predicted 
GAIVI RNA oligonucleotide by an in-house program de- 
signed to choose about half of the 5' or 3' sequence of the 
GAM RNA corresponding to a TM of about 30 -34 G con- 
strained by an optimized 3' clamp, appended to the 
cloning adapter sequence (for "One-tailed" libraries, 
5'-GGNNGGGNNG on the 5" end or TTTAAGGGGATG-3' on 
the 3' end of the GAM RNA; for "Ligation" libraries, the 
same 3' adapter and 5'-GGAGTGAGTAAA on the 5' end of 
the GAM RNA). Gonsequently, a fully complementary 
primer of a TM higher than 60 G was created covering 
only one half of the GAM RNA sequence permitting the 
unbiased elucidation by sequencing of the other half. 

[0365] For each primer, the following criteria were used: Primers 



were graded according to the TM of the primer half and 
the nucleotide content of 3 nucleotides of the 3' clamp 
from worst to best, roughly: GGG-3' <CCC-3' 
<TTT-37AAA-3' <GG-3' <CC-3' <aTM lower than 30 < 
aTM higher than 34 <TT-37AA-3' <3G/C nucleotide 
combination <3 A/T nucleotide combination <any combi- 
nation of two/three different nucleotides <any combina- 
tion of three/three different nucleotides. 
[0366] VALIDATION PGR PRODUGT BY SOUTHERN BLOT 

[0367] CAM RNA oligonucleotides were validated by hybridization 
of Polymerase Chain Reaction (PCR)-product Southern 
blots with a probe to the predicted GAM RNA. 

[0368] pcR product sequences were confirmed by Southern blot 
(Southern E.M., Biotechnology 1992,24:122-139 (1975)) 
and hybridization with DNA oligonucleotide probes syn- 
thesized as complementary (antisense) to predicted GAM 
RNA oligonucleotides. Gels were transferred onto a Bio- 
dyne PLUS 0.45m (Pall) positively charged nylon mem- 
brane and UV cross-linked. Hybridization was performed 
overnight with DIG-labeled probes at 42 C in DIG Easy- 
Hyb buffer (Roche). Membranes were washed twice with 
2xSSC and 0.1% SDS for 10 minutes at 42 C and then 
washed twice with O.SxSSC and 0.1% SDS for 5 min at 42 



C. The membrane was then developed by using a DIG lu- 
minescent detection kit (Roche) using anti-DIG and CSPD 
reaction, according to the manufacturer's protocol. All 
probes were prepared according to the manufacturer's 
(Roche Molecular Biochemicals) protocols: Digoxigenin 
(DIG) labeled antisense transcripts were prepared from 
purified PGR products using a DIG RNA labeling kit with 
T3 RNA polymerase. DIG-labeled PGR was prepared by us- 
ing a DIG PGR labeling kit. 3'-DIG-tailed oligo ssDNA anti- 
sense probes, containing DIG-dUTP and dATP at an aver- 
age tail length of 50 nts were prepared from lOOpmole 
oligonucleotides with the DIG Oligonucleotide Labeling 
Kit. Gontrol reactions contained all of the components of 
the test reaction except library template. 
[0369] VALIDATION OF PGR PRODUGT BY NESTED PGR ON THE 
LIGATION 

[0370] To further validate predicted GAM PGR product sequence 
derived from hemi-primers, a PGR-based diagnostic tech- 
nique was devised to amplify only those products contain- 
ing at least two additional nucleotides of the non hemi- 
primer defined part of the predicted GAM RNA oligonu- 
cleotide. In essence, a diagnostic primer was designed so 
that its 3' end, which is the specificity determining side. 



was identical to the desired GAM RNA oligonucleotide, 
2-10 nts (typically 4-7, chosen for maximum specificity) 
further into its 3' end than the nucleotide stretch primed 
by the hemi-primer. The hemi-primer PGR product was 
first ligated into a T-cloning vector (pTZ57/T or pGEM-T) 
as described hereinabove. The ligation reaction mixture 
was used as template for the diagnostic PGR under strict 
annealing conditions with the new diagnostic primer in 
conjunction with a general plasmid-homologous primer, 
resulting in a distinct -200 base-pair product. This PGR 
product can be directly sequenced, permitting the eluci- 
dation of the remaining nucleotides up to the 3' of the 
mature GAM RNA oligonucleotide adjacent to the 3' 
adapter. Alternatively, following analysis of the diagnostic 
PGR reaction on an agarose gel, positive ligation reactions 
(containing a band of the expected size) were transformed 
into E. coli. Using this same diagnostic technique and as 
an alternative to screening by Southern blot colony hy- 
bridization, transformed bacterial colonies were screened 
by colony-PGR (Gussow, D. and Glackson, T, Nucleic Acids 
Res. 17:4000 (1989)) with the nested primer and the vec- 
tor primer, prior to plasmid purification and sequencing. 
[0371] VALIDATION OF PGR PRODUGT BY GLONING AND SE- 



QUENCING 

[0372] products were inserted into pGEIVI-T (Promega) or 
pTZ57/T (IVIBI Fermentas), heat-shock transformed into 
competent JM109 E. coll (Promega) and seeded on LB- 
Ampicilin plates with IPTG and Xgal. White and light blue 
colonies were transferred to duplicate gridded plates, one 
of which was blotted onto a membrane (Biodyne Plus, Pall) 
for hybridization with DIG tailed oligo probes (according 
to instructions, Roche) complementary to the expected 
GAIVI. Plasmid DNA from positive colonies was sequenced. 

[0373] It is appreciated that the results summarize in Fig. 13 val- 
idate the efficacy of the bioinformatic oligonucleotide de- 
tection engine 100 of the present invention. 

[0374] Reference is now made to Fig. 14A, which is a schematic 
representation of a novel human GR polynucleotide, lo- 
cated on chromosome 9, comprising 2 known human 
miRNA oligonucleotides - MIR24 and MIR23, and 2 novel 
GAM oligonucleotides, herein designated GAM7617 and 
GAM252 (later discovered by other researchers as hsa- 
mir-27b), all marked by solid black boxes. Fig. 14A also 
schematically illustrates 6 non-GAM hairpin sequences, 
and one non-hairpin sequence, all marked by white 
boxes, and serving as negative controls. By "non-GAM 



hairpin sequences" is meant sequences of a similar length 
to known mlRNA precursor sequences, which form hairpin 
secondary folding pattern similar to miRNA precursor 
hairpins, and yet which are assessed by the bioinformatic 
oligonucleotide detection engine 100 not to be valid GAM 
PRECURSOR hairpins. It is appreciated that Fig. 14A is a 
simplified schematic representation, reflecting only the 
order in which the segments of interest appear relative to 
one another, and not a proportional distance between the 
segments. 

[0375] Reference is now made to Fig. 14B, which is a schematic 
representation of secondary folding of each of the MIRs 
and CAMS of the GR MIR24, MIR23, GAM7617 and 
GAM252, and of the negative control non-GAM hairpins, 
herein designated N2, N3, N252, N4, N6 and N7. NO is a 
non-hairpin control, of a similar length to that of known 
miRNA precursor hairpins. It is appreciated that the nega- 
tive controls are situated adjacent to and in between real 
miRNA oligonucleotides and GAM predicted oligonu- 
cleotides and demonstrates similar secondary folding pat- 
terns to that of known MIRs and GAMs. 

[0376] Reference is now made to Fig. 14C, which is a picture of 
laboratory results of a PGR test upon a YMIOO size- 



fractionated "ligation" library, utilizing a set of specific 
primer pairs located directly inside the boundaries of the 
hairpins. Due to the nature of the library the only PGR am- 
plifiable products can result from RNaselll type enzyme 
cleaved RNA, as expected for legitimate hairpin precursors 
presumed to be produced by DROSHA (Lee et al, Nature 
425 415-419, 2003). Fig. 14C demonstrates expression 
of hairpin precursors of known miRNA oligonucleotides 
hsa-mir23 and hsa-mir24, and of novel bioinformatically-de- 
tected CAM7617 and GAM252 hairpins predicted bioin- 
formatically by a system constructed and operative in ac- 
cordance with a preferred embodiment of the present in- 
vention. Fig. 14C also shows that none of the 7 controls (6 
hairpins designated N2, N3, N23, N4, N6 and N7 and 1 
non-hairpin sequence designated NO) were expressed. 
N252 is a negative control sequence partially overlapping 
GAM252. 

[0377] In the picture, test lanes including template are desig- 
nated "+" and the control lane is designated "-". The con- 
trol reaction contained all the components of the test re- 
action except library template. It is appreciated that for 
each of the tested hairpins, a clear PGR band appears in 
the test ("+") lane, but not in the control ("-") lane. 



[0378] Figs. 14A through 14C, when taken together validate the 
efficacy of the bioinformatic oligonucleotide detection en- 
gine in: (a) detecting known miRNA oligonucleotides; (b) 
detecting novel CAM PRECURSOR hairpins which are found 
adjacent to these miRNA oligonucleotides, and which de- 
spite exhaustive prior biological efforts and bioinformatic 
detection efforts, went undetected; (c) discerning between 
CAM (or MIR) PRECURSOR hairpins, and non-CAM hair- 
pins. 

[0379] It is appreciated that the ability to discern CAM-hairpins 
from non-CAM-hairpins is very significant in detecting 
CAM oligonucleotides since hairpins are highly abundant 
in the genome. Other miRNA prediction programs have 
not been able to address this challenge successfully. 

[0380] Reference is now made to Fig. 15A, which is an annotated 
sequence of an EST comprising a novel CAM oligonu- 
cleotides detected by the oligonucleotide detection sys- 
tem of the present invention. Fig. ISA shows the nu- 
cleotide sequence of a known human non-protein-coding 
EST (Expressed Sequence Tag), identified as EST72223. 
The EST72223 clone obtained from TICR database 
(Kirkness and Kerlavage, 1997) was sequenced to yield the 
above 705bp transcript with a polyadenyl tail. It is appre- 



ciated that the sequence of this EST comprises sequences 
of one l<nown miRNA oligonucleotide, identified as hsa- 
MIR98, and of one novel GAM oligonucleotide referred to 
here as CAM25, detected by the bioinformatic oligonu- 
cleotide detection engine 100 (Fig. 2) of the present in- 
vention. 

[0381] The sequences of the precursors of the known MIR98 and 
of the predicted GAM25 precursors are marked in bold, 
the sequences of the established miRNA 98 and of the 
predicted miRNA-like oligonucleotide GAI\/125 are under- 
lined. 

[0382] Reference is now made to Figs. 15B, 15C and 15D, which 
are pictures of laboratory results, which when taken to- 
gether demonstrate laboratory confirmation of expression 
of the bioinformatically-detected novel oligonucleotide of 
Fig. 15A. In two parallel experiments, an enzymatically 
synthesized capped, EST72223 RNA transcript, was incu- 
bated with Hela SlOO lysate for 0 minutes, 4 hours and 24 
hours. RNA was subsequently harvested, run on a dena- 
turing polyacryl amide gel, and reacted with either a 102 
nt antisense MIR98 probe or a 145 nt antisenseGAM25 
precursor transcript probe respectively. The Northern blot 
results of these experiments demonstrated processing of 



EST72223 RNA by Hela lysate (lanes 2-4, in Figs. 15B and 
15C), into ~80bp and ~22bp segments, wliicli reacted 
witli tine MIR98 precursor probe (Fig. 15B), and into 
~100bp and ~24bp segments, which reacted with the 
CAI\/I25 precursor probe (Fig. 15C). These results demon- 
strate the processing of EST72223 by Hela lysate into 
MIR98 precursor and GAM25 precursor. It is also appreci- 
ated from Fig. 15C (lane 1) that Hela lysate itself reacted 
with the GAIVI25 precursor probe, in a number of bands, 
including a ~100bp band, indicating that 
CAM25-precursor is endogenously expressed in Hela 
cells. The presence of additional bands, higher than 
lOObp in lanes 5-9 probably corresponds to the presence 
of nucleotide sequences in Hela lysate, which contain the 
GAM25 sequence. 
[0383] In addition, in order to demonstrate the l<inetics and 

specificity of the processing of MIR98 and GAM25 precur- 
sors into their respective mature, "diced" segments, tran- 
scripts of MIR98 and of the bioinformatically predicted 
GAM25 precursors were similarly incubated with Hela 
SlOO lysate, for 0 minutes, 30 minutes, 1 hour and 24 
hours, and for 24 hours with the addition of EDTA, added 
to inhibit Dicer activity, following which RNA was har- 



vested, run on a polyacrylamide gel and reacted with 
MIR98 and GAM25 precursor probes. Capped transcripts 
were prepared for in vitro RNA cleavage assays with T7 
RNA polymerase, including a m7G(5')ppp(5')G-capping re- 
action using the T7-mMessage mMachine kit (Ambion). 
Purified PGR products were used as template for the reac- 
tion. These were amplified for each assay with specific 
primers containing a T7 promoter at the 5' end and a T3 
RNA polymerase promoter at the 3' end. Gapped RNA 
transcripts were incubated at BOG in supplemented, dialy- 
sis concentrated, Hela SlOO cytoplasmic extract (4G 
Biotech, Seneffe, Belgium). The Hela SlOO was supple- 
mented by dialysis to a final concentration of 20mM 
Hepes, lOOmM KGI, 2.5mM MgGI2, 0.5mM DTT, 20% glyc- 
erol and protease inhibitor cocktail tablets (Complete mini 
Roche Molecular Biochemicals). After addition of all com- 
ponents, final concentrations were lOOmM capped target 
RNA, 2mM ATP, 0.2mM GTP, 500U/ml RNasin, 25 micro- 
gram/ml creatine kinase, 25mM creatine phosphate, 
2.5mM DTT and 50% SlOO extract. Proteinase K, used to 
enhance Dicer activity (Zhang et al., EMBO J. 21, 
5875-5885 (2002)) was dissolved in 50mM Tris-HGI pH 8, 
5mM GaGI2, and 50% glycerol, was added to a final con- 



centration of 0.6 mg/ml. Cleavage reactions were stopped 
by the addition of 8 volumes of proteinase K buffer 
(200Mm Tris-Hcl, pH 7.5, 25m M EDTA, 300mM NaCI, and 
2% SDS) and incubated at 65C for 15min at different time 
points (0, 0.5, 1, 4, 24h) and subjected to phenol/ 
chloroform extraction. Pellets were dissolved in water and 
kept frozen. Samples were analyzed on a segmented half 
6%, half 13% polyacrylamide 1XTBE-7M Urea gel. 

[0384] The Northern blot results of these experiments demon- 
strated an accumulation of a '>'22bp segment which re- 
acted with the MIR98 precursor probe, and of a ~24bp 
segment which reacted with the GAM25 precursor probe, 
over time (lanes 5-8). Absence of these segments when 
incubated with EDTA (lane 9), which is known to inhibit 
Dicer enzyme (Zhang et al., 2002), supports the notion 
that the processing of MIR98 and CAM25 precursors into 
their "diced" segments is mediated by Dicer enzyme, 
found in Hela lysate. Other RNases do not utilize divalent 
cations and are thus not inhibited by EDTA. The molecular 
sizes of EST72223, MIR-98 and GAM25 and their corre- 
sponding precursors are indicated by arrows. 

[0385] Fig. 15D present Northern blot results of same above ex- 
periments with GAM25 probe (24 nt). The results clearly 



demonstrated the accumulation of mature GAM25 
oligonucleotide after 24 h. 

[0386] To validate the identity of the band shown by the lower 
arrow in figs. 15C and 15D, a RNA band parallel to a 
marker of 24 base was excised from the gel and cloned as 
in Elbashir et al (2001) and sequenced. Ninety clones cor- 
responded to the sequence of mature GAM25 oligonu- 
cleotide, three corresponded to CAM25* (the opposite 
arm of the hairpin with a 1-3 nt 3' overhang) and two to 
the hairpin-loop. 

[0387] CAM25 was also validated endogenously by sequencing 
from both sides from a HeLa YMIOO total-RNA "ligation" 
libraries, utilizing hemispecific primers as described in 
Fig. 13. 

[0388] Taken together, these results validate the presence and 
processing of a novel miRNA-like oligonucleotide, 
GAM25, which was predicted bioinformatically. The pro- 
cessing of this novel GAM oligonucleotide product, by 
Hela lysate from EST72223, through its precursor, to its 
final form was similar to that observed for known mlRNA 
oligonucleotide, MIR98. 

[0389] Transcript products were 705 nt (EST72223), 102 nt 
(MIR98 precursor), 125 nt (GAM25 precursor) long. 



EST72223 was PCR-amplified with T7-EST 72223 forward 
primer: 

5'-TAATACGACTCACTATAGGCCCTTATTAGAGGATTCTGCT 
-3' and T3-EST72223 reverse 

primer:"-AATTAACCCTCACTAAAGGI I I I I I I I ICCTGAGA 
CAGAGT-3'.IVIIR98 was PCR-amplified using EST72223 as 
a template with T7MIR98 forward primer: 
5'-TAATACGACTCACTATAGGGTGAGGTAGTAAGTTGTATT 
GTT-3'and T3MIR98 reverse primer: 
5 '- AATTAACCCTCACTAAAGGG AAAGTAGTAAGTTGTATAG 
TT-3'. GAM25 was PCR-amplified using EST72223 as a 
template with GAM25 forward primer: 
5'-GAGGCAGGAGAATTGCTTGA-3' and T3-EST72223 re- 
verse 

primer:5'-AATTAACCCTCACTAAAGGCCTGAGACAGAGTCT 
TGCTC-3'. 

[0390] It is appreciated that the data presented in Figs. 15A, 15B, 
15C and 15D when taken together validate the function of 
the bioinformatic oligonucleotide detection engine 100 of 
Fig. 2. Fig. 15A shows a novel GAM oligonucleotide bioin- 
formatically-detected by the bioinformatic oligonucleotide 
detection engine 100, and Figs. 15C and 15D show labo- 
ratory confirmation of the expression of this novel 



oligonucleotide. This is in accord with the engine training 
and validation methodology described hereinabove with 
reference to Fig. 2. 

[0391] Reference Is now made to Figs. 16A-C, which schemati- 
cally represent three methods that are employed to Iden- 
tify CAM FOLDED PRECURSOR RNA from libraries. Each 
method Involves the design of specific primers for PCR 
amplification followed by sequencing. The libraries in- 
clude hairpins as double-stranded DNA with two different 
adaptors ligated to their 5' and 3" ends. 

[0392] Reference Is now made to Fig. 16A, which depicts a first 
method that uses primers designed to the stems of the 
hairpins. Since the stem of the hairpins often has bulges, 
mismatches, as well as C-T pairing, which Is less signifi- 
cant in DNA than is C-U pairing in the original RNA hair- 
pin, the primer pairs were engineered to have the lowest 
possible match to the other strand of the stem. Thus, the 
F-Stem primer, derived from the 5' stem region of the 
hairpin, was chosen to have minimal match to the 3' stem 
region of the same hairpin. Similarly, the R-stem primer, 
derived from the 3' region of the hairpin (reverse comple- 
mentary to its sequence), was chosen to have minimal 
match to the 5' stem region of the same hairpin. The F- 



Stem primer was extended in its 5' sequence witli tlie T3 
primer (5'-ATTAACCCTCACTAAAGGGA-3') and the R- 
Stem primer was extended in its 5' sequence with the T7 
primer (5 - TAATACGACTCACTATAGGG). The extension is 
needed to obtain a large enough fragment for direct se- 
quencing of the PGR product. Sequence data from the am- 
plified hairpins is obtained in two ways. One way is the di- 
rect sequencing of the PGR products using the T3 primer 
that matches the extension of the F-Stem primer. Another 
way is the cloning of the PGR products into a plasmid, fol- 
lowed by PGR screening of individual bacterial colonies 
using a primer specific to the plasmid vector and either 
the R-Loop (Fig. 16B) or the F-Loop (Fig. 16G) primer. 
Positive PGR products are then sent for direct sequencing 
using the vector-specific primer. 
[0393] Reference is now made to Fig. 16B, which depicts a sec- 
ond method in which R-Stem primer and R-Loop primers 
are used in a nested-PGR approach. First, PGR is per- 
formed with the R-Stem primer and the primer that 
matches the 5' adaptor sequence (5-ad primer). PGR 
products are then amplified in a second PGR using the R- 
Loop and 5-ad primers. As mentioned hereinabove, se- 
quence data from the amplified hairpins is obtained in two 



ways. One way is the direct sequencing of the PGR prod- 
ucts using the 5-ad primer. Another way is the cloning of 
the PGR products into a plasmid, followed by PGR screen- 
ing of individual bacterial colonies using a primer specific 
to the plasmid vector and F-Stem primer. Positive PGR 
products are then sent for direct sequencing using the 
vector-specific primer. It should be noted that optionally 
an extended R-Loop primer is designed that includes a T7 
sequence extension, as described hereinabove (Fig. 16A) 
for the R-Stem primer. This is important in the first se- 
quencing option in cases where the PGR product is too 
short for sequencing. 
[0394] Reference is now made to Fig. 16G, which depicts a third 
method, which is the exact reverse of the second method 
described hereinabove (Fig. 16B). F-Stem and F-Loop 
primers are used in a nested-PGR approach. First, PGR is 
performed with the F-Stem primer and the primer that 
matches the 3' adaptor sequence (3-ad primer). PGR 
products are then amplified in a second PGR using the F- 
Loop and 3-ad primers. As in the other two methods, se- 
quence data from the amplified hairpins is obtained in two 
ways. One way is the direct sequencing of the PGR prod- 
ucts using the F-Loop primer. Another way is the cloning 



of the PGR products into a plasmid, followed by PGR 
screening of individual bacterial colonies using a primer 
specific to the plasmid vector and R-Stem primer. Positive 
PGR products are then sent for direct sequencing using 
the vector-specific primer. It should be noted that option- 
ally an extended F-Loop primer is designed that includes 
aT3 sequence extension, as described hereinabove (Fig. 
16A) for the F-Stem primer. This is important in the first 
sequencing option in cases where the PGR product is too 
short for sequencing and also in order to enable the use 
of T3 primer. 

[0395] In an embodiment of the present invention, the three 

methods mentioned hereinabove may be employed to val- 
idate the expression of GAM FOLDED PREGURSOR RNA. 

[0396] Reference is now made to Fig. 17A, which is a flow chart 
with a general description of the design of the microarray 
to identify expression of published miRNA oligonu- 
cleotides, and of novel GAM oligonucleotides of the 
present invention. 

[0397] A microarray that identifies miRNA oligonucleotides is de- 
signed (Fig. 17B). The DNA microarray is prepared by Agi- 
lent according to their SurePrint Procedure (reference de- 
scribing their technology can be obtained from the Agilent 



website, http://www.agilent.com). In this procedure, the 
oligonucleotide probes are synthesized on the glass sur- 
face. Other methods can also be used to prepare such mi- 
croarray including the printing of pre-synthesized 
oligonucleotides on glass surface or using the pho- 
tolithography method developed by Affymetrix (Lockhart 
DJ et al., Nat Biotechnol. 14: 1675-1680 (1996)). The 
60-mer sequences from the design are synthesized on the 
DNA microarray. The oligonucleotides on the microarray, 
termed "probes" are of the exact sequence as the de- 
signed 60-mer sequences. Importantly, the 60-mer se- 
quences and the probes are in the sense orientation with 
regards to the mlRNA oligonucleotides. Next, a cDNA li- 
brary is created from size-fractionated RNA, amplified, 
and converted back to RNA (Fig. 17C). The resulting RNA 
is termed "cRNA". The conversion to RNA is done using a 
T7 RNA polymerase promoter found on the 3' adaptor 
(Fig. 17C; T7 Ncol-RNA-DNA 3'Adaptor). Since the con- 
version to cRNA is done in the reverse direction compared 
to the orientation of the mlRNA oligonucleotides, the 
cRNA is reverse complementary to the probes and is able 
to hybridize to it. This amplified RNA is hybridized with 
the microarray that identifies mlRNA oligonucleotides, and 



the results are analyzed to indicate the relative level of 
miRNA oligonucleotides (and hairpins) that are present in 
the total RNA of the tissue (Fig. 18). 

[0398] Reference is now made to Fig. 17B, which describes how 
the microarray to identify miRNA oligonucleotides is de- 
signed. miRNA oligonucleotide sequences or potential 
predicted miRNA oligonucleotides are generated by using 
known or predicted hairpins as input. Overlapping poten- 
tial miRNA oligonucleotides are combined to form one 
larger sub-sequence within a hairpin. 

[0399] Jo generate non-expressed sequences (tails), artificial se- 
quences are generated that are 40 nts in length, which do 
not appear in the respective organism genome, do not 
have greater than 40% homology to sequences that appear 
in the genome, and with no 15-nucleotide window that 
has greater than 80% homology to sequences that appear 
in the genome. 

[0400] Jo generate probe sequences, the most probable miRNA 
oligonucleotide sequences are placed at position 3 (from 
the 5' end) of the probe. Then, a tail sub-sequence to the 
miRNA oligonucleotide sequence was attached such that 
the combined sequence length will meet the required 
probe length (60 nts for Agilent microarrays). 



[0401] The tails method provides better specificity compared to 
tlie triplet method. In the triplet method, it cannot be as- 
certained that the design sequence, and not an uncon- 
trolled window from the triplet probe sequence, was re- 
sponsible for hybridizing to the probe. Further, the tails 
method allows the use of different lengths for the poten- 
tial predicted mlRNA oligonucleotide (of combined, over- 
lapping miRNA oligonucleotides). 

[0402] Hundreds of control probes were examined in order to 

ensure the specificity of the microarray. Negative controls 
contain probes which should have low intensity signal. For 
other control groups, the concentration of certain specific 
groups of interest in the library are monitored. Negative 
controls include tail sequences and non-hairpin se- 
quences. Other controls include mRNA for coding genes, 
tRNA, and snoRNA. 

[0403] For each probe that represents known or predicted miRNA 
oligonucleotides, additional mismatch probes were as- 
signed in order to verify that the probe intensity is due to 
perfect match (or as close as possible to a perfect match) 
binding between the target miRNA oligonucleotide cRNA 
and its respective complementary sequence on the probe. 
Mismatches are generated by changing nucleotides in dif- 



ferent positions on the probe with their respective com- 
plementary nucleotides (A <-> T, G <-> C, and vice 
versa). Mismatches in the tail region should not generate a 
significant change in the intensity of the probe signal, 
while mismatches in the miRNA oligonucleotide sequences 
should induce a drastic decrease in the probe intensity 
signal. Mismatches at various positions within the miRNA 
oligonucleotide sequence enable us to detect whether the 
binding of the probe is a result of perfect match or, alter- 
natively, nearly perfect match binding. 

[0404] Based on the above scheme, we designed a DNA mlcroar- 
ray prepared by Agilent using their SurePrint technology. 
Table 11 is a detailed list of microarray chip probes 

[0405] KNOWN miRNA OLIGONUCLEOTIDES: 

[0406] The miRNA oligonucleotides and their respective precur- 
sor sequences are taken from Sanger Database to yield a 
total of 186 distinct miRNA oligonucleotide and precursor 
pairs. The following different probes are constructed: 

[0407] 1. SINGLE miRNA OLIGONUCLEOTIDE PROBES: 

[0408] From each precursor, 26-mer containing the miRNA 

oligonucleotide were taken, then assigned 3 probes for 
each extended miRNA oligonucleotide sequence: 1. the 



26-mer are at the 5' of the 60-mer probe, 2. the 26-mer 
are at the 3' of the 60-mer probe, 3. the 26-mer are in 
the middle of the 60-mer probe. Two different 34-mer 
subsequences from the design tails are attached to the 
26-mer to accomplish 60-mer probe. For a subset of 32 
of Single mlRNA oligonucleotide probes, six additional 
mismatches mutations probes were designed: 
[0409] 4 block mismatches at 5' end of the mlRNA oligonu- 
cleotide; 

[0410] 5 block mismatches at 3' end of the miRNA oligonu- 
cleotide; 

[0411] 1 mismatch at position 10 of the mlRNA oligonucleotide; 

[0412] 2 mismatches at positions 8 and 17 of the mlRNA 

oligonucleotide; 
[0413] 3 mismatches at positions 6, 12 and 18 of the miRNA 

oligonucleotide; and 
[0414] 5 mismatches at different positions out of the miRNA 

oligonucleotide. 
[0415] 2. DUPLEX miRNA OLIGONUCLEOTIDE PROBES: 

[0416] From each precursor, a 30-mer containing the miRNA 
oligonucleotide was taken, then duplicated to obtain 
60-mer probe. For a subset of 32 of probes, three addi- 



tional mismatch mutation probes were designed: 
[0417] 2 mismatches on the first miRNA oligonucleotide; 

[0418] 2 mismatches on the second miRNA oligonucleotide; and 

[0419] 2 mismatches on each of the miRNA oligonucleotides. 

[0420] 3. TRIPLET miRNA OLIGONUCLEOTIDE PROBES: 

[0421] Following Krichevsky's work (Krichevsky et al., RNA 

9:1274-1281 (2003)), head to tail ~22-mer length miRNA 
oligonucleotide sequences were attached to obtain 
60-mer probes containing up to three repeats of the same 
miRNA oligonucleotide sequence. For a subset of 32 
probes, three additional mismatch mutation probes were 
designed: 

[0422] 2 mismatches on the first miRNA oligonucleotide; 
[0423] 2 mismatches on the second miRNA oligonucleotide; and 
[0424] 2 mismatches on each of the miRNA oligonucleotides. 
[0425] 4. PRECURSOR WITH miRNA OLIGONUCLEOTIDE PROBES: 

[0426] For each precursor, 60-mer containing the miRNA 

oligonucleotide were taken. 
[0427] 5. PRECURSOR WITHOUT miRNA OLIGONUCLEOTIDE 

PROBES: 



[0428] For each precursor, a 60-mer containing no more tlien 
16-mer of tlie miRNA oligonucleotide was taken. For a 
subset of 32 probes, additional mismatch probes contain- 
ing four mismatches were designed. 

[0429] CONTROL CROUPS: 

[0430] 1. 100 60-mer sequences from representative ribosomal 
RNAs. 

[0431] 2. 85 60-mer sequences from representatives tRNAs. 
[0432] 3_ 19 60-mer sequences from representative snoRNA. 

[0433] 4, 294 random 26-mer sequences from human genome 
not contained in published or predicted precursor se- 
quences, placing them at the probe's 5' and attached 
34-mer tail described above. 

[0434] 5. Negative Control: 182 different 60-mer probes con- 
tained different combinations of 10 nt-long sequences, in 
which each 10 nt-long sequence is very rare in the human 
genome, and the 60-mer combination is extremely rare. 

[0435] PREDICTED CAM RNAs: 

[0436] There are 8381 pairs of predicted CAM RNA and their re- 
spective precursors. From each precursor, a 26-mer con- 
taining the CAM RNA was placed at the 5' of the 60-mer 
probe and a 34-mer tail was attached to it. For each pre- 



dieted probe, a mutation probes with 2 mismatches at po- 
sitions 10 and 15 of the GAM RNA were added. 

[0437] For a subset of 660 predicted precursors, up to 2 probes 
each containing one side of the precursor including any 
possible GAM RNA in it were added. 

[0438] |\/iicroarray analysis: 

[0439] Based on known mlRNA oligonucleotide probes, a pre- 
ferred position of the mlRNA oligonucleotide on the probe 
was evaluated, and hybridization conditions adjusted and 
the amount of cRNA to optimize microarray sensitivity and 
specificity ascertained. Negative controls are used to cal- 
culate background signal mean and standard deviation. 
Different probes of the same mlRNA oligonucleotide are 
used to calculate signal standard deviation as a function 
of the signal. 

[0440] For each probe, BG_Z_Score = (log(probe signal) - mean 
of log(negative control signal))/(log(negative control sig- 
nal) standard deviation) were calculated. 

[0441] For a probe with a reference probe with 2 mismatches on 
the mlRNA oligonucleotide, MM_Z_Score MM_Z_Score = 
(log(perfect match signal) - log(reference mismatch sig- 
nal))/(standard deviation of log(signals) as the reference 
mismatch log(signal)) were calculated. 



[0442] BG_Z_Score and MM_Z_Score are used to decide whether 
the probe is on and its reliability. 

[0443] Reference is now made to Fig. 17C, which is a flowchart 
describing how the cDNA library was prepared from RNA 
and amplified. The general procedure was performed as 
described previously (Elbashir SM, Lendeckel W, TuschI T. 
RNA interference is mediated by 21- and 22-nucleotide 
RNAs. Genes Dev. 2001 15:188-200) with several modifi- 
cations, which will be described hereinbelow. 

[0444] First, the starting material is prepared. Instead of starting 
with standard total RNA, the total RNA was size- 
fractionated using an YM-100 Microcon column (Millipore 
Corporation, Billerica, Massachusetts, USA) in the present 
protocol. Further, the present protocol uses human tissue 
or cell lines instead of a Drosophila in vitro system as 
starting materials. Finally, 3 micrograms of size- 
fractionated total RNA was used for the ligation of adaptor 
sequences. 

[0445] Libraries used for microarray hybridization are listed 

hereinbelow: "A" library is composed of a mix of libraries 
from Total HeLa YMIOO RNA and Nuclear HeLa YMIOO 
RNA; "B" library is composed of a mix of libraries from 
Total HEK293 YMIOO RNA and Nuclear HEK293 YMIOO 



RNA; "C" library is composed of a mix of YlVllOO RNA li- 
braries from Total PC3, Nuclear PC3 and from PCS cells in 
which Dicer expression was transiently silenced by Dicer 
specific siRNA; "D" library is prepared from YMIOO RNA 
from Total Human Brain (Ambion Cat#7962); "E" library is 
prepared from YMIOO RNA from Total Human Liver 
(Ambion Cat#7960); "F" library is prepared from YMIOO 
RNA from Total Human Thymus (Ambion Cat#7964); "C" 
library is prepared from YMIOO RNA from Total Human 
Testis (Ambion Cat#7972); and "H" library is prepared 
from YMIOO RNA from Total Human Placenta (Ambion 
Cat#7950). 

[0446] Library letters appended by a numeral "1" or "2" are di- 
gested by Xbal (NEB); Library letters affixed by a numeral 
"3" are digested by Xbal and Spel (NEB); Library letters 
appended by a numeral "4" are digested by Xbal and the 
transcribed cRNA is then size-fractionated by YM30, re- 
taining the upper fraction consisting of 60 nts and longer; 
Library letters affixed by a numeral "5" are digested by 
Xbal and the transcribed cRNA is then size-fractionated 
by YM30 retaining the flow-through fraction consequently 
concentrated with YMIO consisting of 30 nts-60 nts; Li- 
brary letters affixed by a numeral "6" are digested by Xbal 



and the DNA is fractionated on a 13% native acrylamide 
gel from 40-60 nt, electroeluted on a GeBaFlex IVIaxi col- 
umn (GeBa Israel), and lyophilized; Library letters affixed 
by a numeral "7" are digested by Xbal and the DNA is 
fractionated on a 13% native acrylamide gel from 80-160 
nt, electroeluted and lyophilized. 

[0447] Next, unique RNA-DNA hybrid adaptor sequences with a 
T7 promoter were designed. This step is also different 
than other protocols that create libraries for microarrays. 
Most protocols use complements to the polyA tails of 
mRNA with a T7 promoter to amplify only mRNA. How- 
ever, in the present invention, adaptors are used to am- 
plify all of the RNA within the size-fractionated starting 
material. The adaptor sequences are ligated to the size- 
fractionated RNA as described in Fig. 13, with subsequent 
gel-fractionation steps. The RNA is then converted to first 
strand cDNA using reverse transcription. 

[0448] Next, the cDNA is amplified using PGR with adaptor-spe- 
cific primers. At this point, there is the optional step of 
removing the tRNA, which is likely to be present because 
of its low molecular weight, but may add background 
noise in the present experiments. All tRNA contain the se- 
quence AGG at their 3' end, and the adaptor contains GGT 



at its 5' end. This sequence together (GGTACC) is the tar- 
get site for Ncol restriction digestion. Thus, adding the 
restriction enzyme Ncol either before or during PGR am- 
plification will effectively prevent the exponential amplifi- 
cation of the cDNA sequences that are complements of 
the tRNAs. 

[0449] The amplified DNA is restriction enzyme-digested with 

Xbal (and, optionally, with Pst or Spel) to remove the ma- 
jority of the adaptor sequences that were initially added to 
the RNA. Using the first set of RNA-DNA hybrid adaptors 
listed below, the first two sets of primers listed below, 
and Xbal restriction digest yields the following cRNA 
products: 5'GGCCA - PRE/miRNA- UAUCUAG, where PRE is 
defined as GAM PRECURSOR (palindrome). Using the sec- 
ond set of RNA-DNA hybrid adaptors listed below, the 
second set of primers listed below, and Xbaland Pst re- 
striction digest yields the following, smaller cRNA prod- 
ucts: 5'GG-PRE/miRNA - C*. 

[0450] Then, cDNA is transcribed to cRNA utilizing an RNA poly- 
merase e.g. T7 dictated by the promoter incorporated in 
the adaptor. cRNA may be labeled in the course of tran- 
scription with aminoallyl or fluorescent nucleotides such 
as Gy3- or Gy5-UTP and GTP among other labels, and 



cRNA sequences thus transcribed and labeled are hy- 
bridized with the microarray. 

[0451] The following RNA-DNA hybrid adaptors are included in 
the present invention: 

[0452] Name: T7 Ncol-RNA-DNA 3'Adapter 

[0453] Sequence: 

5'(5phos)rUrGrGCCTATAGTGAGTCGTATTA(3lnvdT)3' 
[0454] 2. Name: 5Ada RNA-DNA XbaBseRI 

[0455] Sequence: 5" AAAGGAGGAGCTCTAGrArUrA 3" or option- 
ally: 

[0456] 3. Name: 5Ada MC RNA-DNA PstAtaBser 

[0457] Sequence: 5' CCTAGGAGGAGGACGTCTGrCrArG 3' 

[0458] 4. Name: 3'Ada nT7 MC RNA-DNA 

[0459] Sequence: 5' (5phos) rCrCrUATAGTGAGTCGTATTATCT 
(3lnvdT)3' 

[0460] The following DNA primers are included in the present in- 
vention: 

[0461] 1. Name: T7 Ncol-RT-PCR primer 

[0462] Sequence: 5" TAATACGACTCACTATAGGCCA 3" 

[0463] 2. Name: T7Nhel Spel-RT-PCR primer 



[0464] Sequence: 5' GCTAGCACTAGTTAATACGACTCACTATAG- 
GCCA 3' 

[0465] 3. Name: 5Ada XbaBseRI Fwd 

[0466] Sequence: 5' AAAGGAGGAGCTCTAGATA 3' 

[0467] 4. Name: Pst- 5 Ada XbaBseRI Fwd 

[0468] Sequence: 5" TGACCTGCAGAAAGGAGGAGCTCTAGATA 3' 

[0469] or optionally: 

[0470] 5. Name: 5Ada MC PstAtaBser fwd 

[0471] Sequence: 5' ATCCTAGGAGGAGGACGTCTGCAG 3' 

[0472] 6. Name: RT nT7 MC Xbal 

[0473] Sequence: 5" GCTCTAGGATAATACGACTCACTATAGG 3" 

[0474] Reference is now made to Fig. 18A, which demonstrates 
the detection of known mlRNA oligonucleotides and of 
novel GAM oligonucleotides, using a microarray con- 
structed and operative in accordance with a preferred em- 
bodiment of the present invention. Based on negative 
control probe intensity signals, we evaluated the back- 
ground, non-specific, logarithmic intensity distribution, 
and extracted its mean, designated BG.mean, and stan- 
dard deviation, designated BG_std. In order to normalize 



intensity signals between different microarray experi- 
ments, a Z score, wliich is a statistical measure that quan- 
tifies the distance (measured in standard deviations) that 
a data point is from the mean of a data set, was calculated 
for each probe with respect to the negative control using 
the following Z score formula: Z = (logarithm of probe 
signal BG_mean)/BG_std. We performed microarray exper- 
iments using RNA extracted from several different tissues 
and we calculated each probes maximum Z score. Fig. 
18A shows the percentages of known, predicted and neg- 
ative control groups that have a higher max Z score than a 
specified threshold as a function of max Z score thresh- 
old. The negative control group plot, included as a refer- 
ence, considers probe with a max Z score greater then 4 
as a reliable probe with meaningful signals. The sensitivity 
of our method was demonstrated by the detection of al- 
most 80% of the known published miRNA oligonucleotides 
in at least one of the examined tissues. At a threshold of 4 
for the max Z score, 28% of the predicted GAMs are 
present in at least one of the examined tissues. 
[0475] Reference is now made to Fig. 18B, which is a line graph 
showing specificity of hybridization of a microarray con- 
structed and operative in accordance with a preferred em- 



bodiment of the present invention and described herein- 
above with reference to Figs. 17A-17C. 

[0476] The average signal of l<nown miRNA oligonucleotides in 
Library A2 is presented on a logarithmic scale as a func- 
tion of the following probe types under two different hy- 
bridization conditions: 50 C and 60 C: perfect match (PM), 
six mismatches on the tail (TAIL MM), one mismatch on 
the miRNA oligonucleotide (IMM), two separate mis- 
matches on the miRNA oligonucleotide (2MM), three sepa- 
rate mismatches on the miRNA oligonucleotide (3MM). 
The relative equality of perfect match probes and probes 
with the same miRNA oligonucleotide but many mis- 
matches over the tail attest to the independence between 
the tail and the probe signal. At a hybridization tempera- 
ture of 60 C, one mismatch in the middle of the miRNA 
oligonucleotide is enough to dramatically reduce the 
probe signal. Conducting chip hybridization at 60 C en- 
sures that a probe has a very high specificity. 

[0477] It js appreciated that these results demonstrate the speci- 
ficity of the microarray of the present invention in detect- 
ing expression of miRNA oligonucleotides. 

[0478] Reference is now made to Fig. 18C, which is a summary 
table demonstrating detection of known miRNA oligonu- 



cleotides using a microarray constructed and operative in 
accordance with a preferred embodiment of tlie present 
invention and described hereinabove with reference to 
Figs. 17A-17C. 

[0479] Labeled cRNA from HeLa cells and Human Liver, Brain, 

Thymus, Placenta, and Testes was used for 6 different hy- 
bridizations. The table contains the quantitative values 
obtained for each miRNA oligonucleotide probe. For each 
miRNA oligonucleotide, the highest value (or values) is 
given in bolded font while lower values are given in regu- 
lar font size. Results for MIR-124A, MIR-9 and MIR-122A 
are exactly as expected from previous studies. The "Refer- 
ences" column contains the relevant references in the 
published literature for each case. In addition to these 
miRNA oligonucleotides, the table shows other known 
miRNA oligonucleotides that are expressed in a tissue- 
specific manner. The results indicate that MIR-128A, MIR- 
129 and MIR-128B are highly enriched in Brain; MIR-194, 
MIR-148 and MIR-192 are highly enriched in Liver; mlR- 
96, MIR-150, MIR-205, MIR-182 and MIR-183 are highly 
enriched in Thymus; MIR-204, MIR-lOB, MIR-154 and 
MIR134 are highly enriched in Testes; and MIR-122, MIR- 
210, MIR-221, MIR-141, MIR-23A, MIR-200C and MIR- 



136 are highly enriched in Placenta. In most cases, low 
but significant levels are observed in the other tissues. 
However, in some cases, miRNA oligonucleotides are also 
expressed at relative high levels in an additional tissue. 

[0480] It is appreciated that these results reproduce previously 

published studies of expression of known mlRNA oligonu- 
cleotides. These results demonstrate the reliability of the 
microarray of the present invention in detecting expres- 
sion of published miRNA oligonucleotides, and of novel 
GAM oligonucleotides of the present invention. 

[0481] Reference is now made to Fig. 19, which presents pictures 
of laboratory results that demonstrate laboratory confir- 
mation of excision ("dicing") of four bioinformatically-de- 
tected novel HIVl mlRNA-like oligonucleotides from their 
predicted precursors by incubation in HeLa S-100 lysate 
as described in Fig. 15. 

[0482] Fig. 19A presents the entire 5'UTR of HIVl (U5R) contain- 
ing two predicted GAM precursors in bold. The bioinfor- 
matically-predicted mature GAM RNAs are underlined, one 
closer to the 5' end (Fig. 19B) and the second closer to the 
3" end (Fig. 19C). The 5'-most GAM RNA matches the 
known HIVl RNA structure named TAR to which the TAT 
protein binds (Nature 1987. 330:489-93). 



[0483] Figs. 19B and 19C depict Northern blot analysis of GAM 
RNA oligonucleotides that are present in U5R, hybridized 
with predicted mature GAM RNA probes. The upper arrow 
indicates the molecular size of the entire 355 nt U5R tran- 
script. The predicted molecular sizes of the two GAM 
RNAs are 22 nt and 17 nt, respectively. The lower arrow 
indicates the 22 nt molecular marker. Lanes: 1 - Hela 
lysate; 2 -U5R transcript in HeLa Lysate without incuba- 
tion; and 3 - U5R transcript incubated for 24 hours with 
Hela lysate. 

[0484] Figs. 19D and 19E present partial transcripts of HIVl RNA 
reacted with predicted mature HIVl-GAM RNA probes. In 
each figure, the experimental transcript sequence is 
shown, and the predicted mature GAM RNA is underlined. 
Northern blot analyses of GAM precursors are presented. 
It is demonstrated that one GAM precursor transcript is 
163 nt and the other GAM precursor transcript is 200 nt. 
The predicted molecular sizes of mature GAM RNA are 
both 24 nt. The 22 nt molecular marker is indicated. 
Lanes: 1 - Transcript in HeLa Lysate without incubation 
and 2 - Transcript incubated for 24 hours with HeLa 
lysate. 

[0485] It is appreciated that the sequences of the expected sizes 



that hybridize with the probe comprise sequences of novel 
GAIVI oligonucleotides, detected by the bioinformatic 
oligonucleotide detection engine 100 of the present in- 
vention, described hereinabove with reference to Fig. 2. 
[0486] Reference is now made to Fig. 20, which presents pictures 
of laboratory results that confirm expression of the bioin- 
formatically-detected novel Vaccinia GAM oligonu- 
cleotides CAM501943 (Figs. 20A and 20C) and 
GAM501981 (Fig. 19B). HeLa cells were infected with 50 
PFU Vaccinia Virus and total RNA was harvested after 3 
days. Northern blot analysis was performed with a 53 nt 
DIG-labeled RNA probe for GAM501943 predicted precur- 
sor (Fig. 19A), a 73 nt DIG-labeled RNA probe for 
GAM501981 predicted precursor (Fig. 19B) or a 22 nt 
32P-ATP-labeled DNA oligonucleotide probe for predicted 
mature GAM501943 .2 (Fig. 19C). Lanes: 1 - GAM precur- 
sors in total RNA extracted from HeLa cells infected with 
Vaccinia Virus; 2 - GAM precursors in total RNA extracted 
from HeLa cells that were not infected with Vaccinia Virus; 
and 3 - a transcript of predicted sequence and size was 
run alongside the other lanes to serve as a size marker 
and a hybridization control (Figs. 19A and 19B). The arrow 
in Fig. 19C marks a band 53 nt, which is the predicted 



precursor size, that reacts with mature 22 nt GAI\/I501943 
.2 probe. 

[0487] It is appreciated that the sequences that hybridized with 
the probe appear only in infected cells in vivo and com- 
prise sequences of novel GAM gene precursors, referred 
to here as GAM501943 and GAM501981, detected by the 
bioinformatic gene detection engine 100 of the present 
invention, as described hereinabove with reference to Fig. 
2. 

[0488] Reference is now made to Fig. 21A, which is a picture of 
an agarose gel demonstrating the effect of HIV infection 
on GAM RNA levels in H9 cells. DNA libraries were created 
from H9 cells and from HIV-infected H9 cells as described 
in Fig. 17G. Briefly, adaptor sequences were ligated to 
YMIOO RNA, reverse transcribed to cDNA and amplified to 
libraries using PGR with hemispecific primers. The PGR 
product was run on an agarose gel and stained with 
ethidium bromide. Some representative gels are presented 
in Fig. 21A. Lane 1 shows PGR product from a reaction run 
with all of the reaction components except for the library 
and serves as a control; Lane 2 shows PGR product levels 
from an H9 cell library that has not been infected with 
HIV; and Lane 3 shows PGR product levels from an H9 cell 



library that was infected with HIV. 
[0489] It is appreciated that HIV infection of H9 cells increases 
the expression of small, approximately 22 nt-long 
oligonucleotides in certain predicted GAM RNAs (Examples 
1 and 2), while decreasing them in other predicted GAM 
RNAs (Example 3). Lanes that did not contain any libraries 
did not show any bands, demonstrating a lack of contami- 
nated DNA in the procedure. The 60 nt bands represent- 
ing 22 nt-long GAM RNA ligated to adaptors were then 
excised, cloned and sequenced, with reference to Fig. 2 IB 
hereinbelow. 

[0490] Reference is now made to Fig. 2 IB, which is a table of 
laboratory results validating expression of novel human 
oligonucleotides in human cells infected with HIV, that are 
detected by a bioinformatic oligonucleotide detection en- 
gine 100 (Fig. 2), constructed and operative in accordance 
with a preferred embodiment of the present invention. It 
is appreciated that the bioinformatic predictions pre- 
sented here serve only as examples of a large quantity of 
results from the bioinformatic oligonucleotide detection 
engine 100. The bands from the gels depicted in Fig. 21A 
were cloned and sequenced as described hereinabove. In 
brief, a primer was designed such that its first half, the 5' 



region, is complementary to the adaptor sequence and its 
second lialf, tlie 3' region, anneals to the 5' terminus of 
GAM RNA sequence, yielding a hemispecific primer (as 
elaborated herein above with reference to Fig. 13). Pre- 
dicted CAM RNA sequences were examined by PGR using 
hemispecific primers and a primer specific to the 3' adap- 
tor. PGR products were cloned into plasmid vectors and 
then sequenced. (The predicted GAM RNA were verified by 
cloning and sequencing using a primer that was originally 
designed for a slightly different prediction.) 
[0491] The results are presented in a table that includes the fol- 
lowing fields: "Primer Sequence" contains the "specific" 
part of the hemispecific primer; "Sequenced sequence" 
represents the nucleotide sequence detected by cloning 
(excluding the hemispecific primer sequence); "Predicted 
GAM RNA" contains the GAM RNA sequence that is pre- 
dicted by the bioinformatic oligonucleotide detection en- 
gine 100; "GAM precursor sequence" contains the se- 
quence of the GAM precursor RNA that is predicted by the 
bioinformatic oligonucleotide detection engine 100; "Ghr" 
depicts the human chromosome on which the GAM pre- 
cursor lies; "Strand" indicates whether the predicted GAM 
precursor lies on the "+" or "-" strand of the chromosome; 



and "Start Offset" contains the nucleotide number of the 
specified chromosome at which the predicted CAM pre- 
cursor sequence begins. 
[0492] It is appreciated that the "sequenced sequence" from Row 

1 of the table in Fig. 2 IB was sequenced from the 60nt 
band in Lane 3 of Example 1 of 21A. It is further appreci- 
ated that the "sequenced sequence" from Rows 2 and 3 
were sequenced from the 60nt band in Lane 3 of Example 

2 of 21A. Row 3 was cloned from five independent clones 
showing robustness of that GAM RNA. It is further appre- 
ciated that the "sequenced sequence" from Row 4 was se- 
quenced from the 60nt band in Lane 2 of Example 3 of 
21A. It is still further appreciated that the "sequenced se- 
quence" from Rows 5 and 6 were sequenced from the 
60nt band in Lane 3 of Example 3 of 21A. Thus, it may be 
speculated that HIV infection of H9 cells downregulates 
the GAM RNA presented in Row 4 of the Table and simul- 
taneously upregulates the GAM RNAs presented in Rows 5 
and 6 of the Table, with an overall downregulation of 
band signal intensity. 

DETAILED DESCRIPTION OF TABLES 

[0493] Table 1 comprises data relating the SEQ ID NO of oligonu- 
cleotides of the present invention to their corresponding 



GAM NAME, and contains the following fields: GAM SEQ- 
ID: GAM SEQ ID NO, as in the Sequence Listing; GAM 
NAME: Rosetta Genomics Ltd. nomenclature (see below); 
GAM RNA SEQUENCE: Sequence (5' to 3') of the mature, 
"diced" GAM RNA; GAM ORGANISM: identity of the organ- 
ism encoding the GAM oligonucleotide; GAM POS: Dicer- 
cut location (see below); and 

[0494] Table 2 comprises detailed textual description according 
to the description of Fig. 1 of each of a plurality of novel 
GAM oligonucleotides of the present invention, and con- 
tains the following fields: GAM NAME: Rosetta Genomics 
Ltd. nomenclature (see below); GAM ORGANISM: identity 
of the organism encoding the GAM oligonucleotide; PRE- 
CUR SEQ-ID:GAM precursor Seq-ID, as in the Sequence 
Listing; PRECURSOR SEQUENCE: Sequence (5' to 3") of the 
GAM precursor; GAM DESCRIPTION: Detailed description 
of GAM oligonucleotide with reference to Fig. 1; and 

[0495] Table 3 comprises data relating to the source and location 
of novel GAM oligonucleotides of the present invention, 
and contains the following fields: GAM NAME: Rosetta Ge- 
nomics Ltd. nomenclature (see below); PRECUR SEQ-ID: 
GAM precursor SEQ ID NO, as in the Sequence Listing; 
GAM ORGANISM: identity of the organism encodes the 



GAM oligonucleotide; SOURCE: For human GAM- 
chromosome encoding the human GAIVI oligonucleotide, 
otherwise- accession ID (GenBank, NCBI); STRAND: Orien- 
tation of the strand, "+" for the plus strand, "-" for the 
minus strand; SRC-START OFFSET: Start offset of GAM 
precursor sequence relative to the SOURCE; SRC-END 
OFFSET: End offset of GAM precursor sequence relative to 
the SOURCE; and 

[0496] Table 4 comprises data relating to GAM precursors of 

novel GAM oligonucleotides of the present invention, and 
contains the following fields: GAM NAME: Rosetta Ge- 
nomics Ltd. nomenclature (see below); PRECUR SEQ-ID: 
GAM precursor Seq-ID, as in the Sequence Listing; GAM 
ORGANISM: identity of the organism encoding the GAM 
oligonucleotide; PRECURSOR-SEQUENCE: GAM precursor 
nucleotide sequence (5' to 3'); GAM FOLDED PRECURSOR 
RNA: Schematic representation of the GAM folded precur- 
sor, beginning 5' end (beginning of upper row) to 3' end 
(beginning of lower row), where the hairpin loop is posi- 
tioned at the right part of the draw; and 

[0497] Table 5 comprises data relating to GAM oligonucleotides 
of the present invention, and contains the following fields: 
GAM NAME: Rosetta Genomics Ltd. nomenclature (see be- 



low); GAM ORGANISM: identity of the organism encoding 
the GAM oligonucleotide; GAM RNA SEQUENCE: Sequence 
(5' to 3') of the mature, "diced" GAM RNA; PRECUR SEQ-ID: 
GAM precursor Seq-ID, as in the Sequence Listing; GAM 
POS: Dicer-cut location (see below); and 

[0498] Table 6 comprises data relating SEQ ID NO of the GAM 
target gene binding site sequence to TARGET gene name 
and target binding site sequence, and contains the follow- 
ing fields: TARGET BINDING SITE SEQ-ID: Target binding 
site SEQ ID NO, as in the Sequence Listing; TARGET OR- 
GANISM: identity of organism encode the TARGET gene; 
TARGET: GAM target gene name; TARGET BINDING SITE 
SEQUENCE: Nucleotide sequence (5' to 3") of the target 
binding site; and 

[0499] Table 7 comprises data relating to target-genes and bind- 
ing sites of GAM oligonucleotides of the present inven- 
tion, and contains the following fields: GAM NAME: 
Rosetta Genomics Ltd. nomenclature (see below); GAM 
ORGANISM: identity of the organism encoding the GAM 
oligonucleotide; GAM RNA SEQUENCE: Sequence (5* to 3") 
of the mature, "diced" GAM RNA; TARGET: GAM target 
gene name; TARGET REF-ID: For human target genes- 
Target accession number (RefSeq, GenBank); Otherwise- 



the location of the target gene on the genome annotation. 
TARGET ORGANISIVI: identity of organism encode the TAR- 
GET gene; UTR: Untranslated region of binding site/s (3' 
or 5'); TARGET BS-SEQ: Nucleotide sequence (5' to 3') of 
the target binding site; BINDING SITE-DRAW: Schematic 
representation of the binding site, upper row represent 5' 
to 3' sequence of the TARGET, Lower row represent 3' to 
5' Sequence of the GAM RNA; GAM POS: Dicer-cut location 
(see below); and 
[0500] Table 8 comprises data relating to functions and utilities 
of novel GAM oligonucleotides of the present invention, 
and contains the following fields: GAM NAME: Rosetta Ge- 
nomics Ltd. nomenclature (see below); GAM RNA SE- 
QUENCE: Sequence (5* to 3') of the mature, "diced" GAM 
RNA; GAM ORGANISM: identity of the organism encoding 
the GAM oligonucleotide; TARGET: GAM target gene name; 
TARGET ORGANISM: identity of organism encode the TAR- 
GET gene; GAM FUNCTION: Description of the GAM func- 
tions and utilities; GAM POS: Dicer-cut location (see be- 
low); and 

[0501] Table 9 comprises references of GAMs target genes and 
contains the following fields: TARGET: Target gene name; 
TARGET ORGANISM: identity of organism encode the TAR- 



GET gene; REFERENCES: reference relating to the target 
gene; and 

[0502] Table 10 comprises data relating to novel GR (Genomic 
Record) polynucleotides of the present invention, and 
contains the following fields: GR NAME: Rosetta Genomics 
Ltd. nomenclature (see below); GR ORGANISM: identity of 
the organism encoding the GR polynucleotide; GR DE- 
SCRIPTION: Detailed description of a GR polynucleotide, 
with reference to Fig. 9; and 

[0503] Table 11 comprises data of all sequences printed on the 
microarray of the microarray experiment, as described 
herein above with reference to Fig. 17 and include the fol- 
lowing fields: PROBE SEQUENCE: the sequence that was 
printed on the chip PROBE TYPE: as described in detail in 
Fig. 17 in chip design section and summarized as follows: 
Known: published miRNA sequence; Known.misl: similar 
to published miRNA sequence, but with 1 mismatch mu- 
tation on the miRNA sequence; Known_mis2: similar to 
published miRNA sequence, but with 2 mismatch muta- 
tions on the miRNA sequence; Known_mis3: similar to 
published miRNA sequence, but with 3 mismatch muta- 
tions on the miRNA sequence; Known_mis4: similar to 
published miRNA sequence, but with 6 mismatch muta- 



tions on regions other than the miRNA sequence; Pre- 
dicted: predicted GAIVI RNA sequences; IVIismatch: se- 
quences that are similar to predicted GAM RNA sequences 
but with 2 mismatches; Edgesl: left half of GAM RNA se- 
quences; Edges2: right half of GAM RNA sequences ex- 
tended with its hairpin precursor (palindrome); Control 1: 
negative control; Control2: random sequences; Control3: 
tRNA; Control4: snoRNA; Controls: mRNA; ControlG: 
other; GAM RNA SEQ ID/MIR NAME: GAM oligonucleotide 
using Rosetta Genomics Ltd. Nomenclature (see below) or 
published mlRNA oligonucleotide terminology; GAM RNA 
SEQUENCE: Sequence (5* to 3") of the mature, "diced" GAM 
RNA; LIBRARY: the library name as defined in Fig. 17C; 
SIGNAL: Raw signal data for library; BACKGROUND Z- 
SCORE: Z-score of probe signal with respect to back- 
ground, negative control signals; MISMATCH Z-SCORE: Z- 
score of probe signal with respect to its mismatch probe 
signal; and 

[0504] Table 12 lists the GAM oligonucleotide sequences in- 
cluded in the present invention that were validated by lab- 
oratory means. For validated sequences of the present in- 
vention with more than one SEQ ID, the SEQ ID listed in 
the table may be arbitrarily chosen. The table includes the 



following fields: SEQUENCED: GAM oligonucleotides that 
were sequenced, as described hereinabove with reference 
to Fig. 13, are denoted by "1"; CHIP EXPRESSION: GAM 
oligonucleotide sequences that were validated by microar- 
ray experiments, as described hereinabove with reference 
to Figs. 17A-C and 18A-C, are denoted by either "1" or 
"2". A "chip expression" value of 2 refers to GAM oligonu- 
cleotide sequences whose intensity is more than 6 stan- 
dard deviations above the background intensity and 2 
standard deviations above the intensity of the mismatch 
probe. A "chip expression" value of 1 refers to mlRNA 
oligonucleotide sequences, whose intensity was more than 
4 standard deviations above the background intensity. 
Note that some mlRNA oligonucleotide sequences were 
validated by both microarray experiments and sequenc- 
ing; SIGNAL: a raw signal data; BACKGROUND Z-SCORE: a 
Z-score of probe signal with respect to background, neg- 
ative control signals; MISMATCH Z-SCORE: a Z-score of 
probe signal with respect to its mismatch probe signal; 
and 

[0505] Table 13 comprises sequence data of GAMs associated 

with different viral infections. Each row refers to a specific 
viral infections, and lists the SEQ ID NOs of GAMs that tar- 



get genes associated with tliat viral infection. The table 
contains the following fields: ROW#: index of the row 
number; INFECTION NAME: name of the infecting organ- 
ism; and SEQ ID NOs OF GAMS ASSOCIATED WITH INFEC- 
TION: list of sequence listing IDs of GAMs targeting genes 
that are associated with the specified infection. 

[0506] Table 14 lists HIV-1 GAM oligonucleotides detected by the 
bioinformatics detection engine 100 of the present inven- 
tion and include the following fields: GAM PRECURSOR SE- 
QUENCE: Nucleotide sequence of the GAM precursor; GAM 
RNA SEQ: Nucleotide sequence of the GAM RNA; SOURCE: 
Source accession number encoding the GAM oligonu- 
cleotide; SRC-START OFFSET: Start offset of GAM precur- 
sor sequence relative to the SOURCE; STR: Orientation of 
the strand, "+" for the plus strand, "-" for the minus 
strand; TARGET: Target gene name; TARGET ORGANISM: 
organism encoding the target; TAR-REF ID: For human 
target genes-Target accession number (RefSeq, GenBank), 
otherwise, the location of the target gene on the genome 
annotation.; BINDING SITE SEQUENCE: Nucleotide se- 
quence of the target binding site; 

[0507] The following conventions and abbreviations are used in 
the tables: The nucleotide "U" is represented as "T" in the 



tables, and; 

[0508] GAM NAME or GR NAME are names for nucleotide se- 
quences of the present invention given by RosettaGe- 
nomics Ltd. nomenclature method. All GAMs/GRs are des- 
ignated by GAMx/GRx where x is a unique ID. 

[0509] CAM POS is a position of the GAM RNA on the GAM PRE- 
CURSOR RNA sequence. This position is the Dicer-cut lo- 
cation: A indicates a probable Dicer-cut location; B indi- 
cates an alternative Dicer-cut location. 

[0510] All human nucleotide sequences of the present invention 
as well as their chromosomal location and strand orienta- 
tion are derived from sequence records of UCSC-hgl6 
version, which is based on NCBI, Build34 database (April, 
2003). 

[0511] All viral sequences of the present invention as well as their 
genomic location are derived from NCBI, RefSeq database. 



